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INTRODUCTION 

Breast  cancer  (BC)  is  a  heterogeneous  disease  with  varying  clinical  behavior  and 
response  to  therapy  that  cannot  be  predicted  based  on  existing  clinical  and  pathologic 
classifications.  This  has  led  to  an  intense  effort  to  understand  the  biology  of  BC  and  a  search  for 
genes  and  gene  products  that  play  a  major  role  in  tumor  development  and  progression.  A 
comprehensive  analysis  of  gene  expression  can  provide  crucial  clues  concerning  the  intrinsic 
biology  of  a  cancer  and  ultimately  contribute  to  diagnostic  decisions  and  therapies  tailored  to  an 
individual  patient.  New,  high-throughput  mRNA  analysis  platforms,  such  as  DNA  microarrays, 
allow  comprehensive  measurement  of  gene  expression  and  can  produce  large  data  sets  with  the 
potential  to  provide  novel  insights  into  biology  at  the  molecular  level.  Our  studies  are  designed  to 
identify  gene  expression  profiles  that  are  associated  with  tumor  progression  and  can  be  used  for 
discrimination  of  clinically  relevant  subgroups  of  BC.  An  understanding  of  the  mechanisms  that 
drive  progression  of  BC  will  provide  biomarkers  for  diagnosis,  risk  stratification  and  therapeutic 
targets  that  could  have  an  enormous  impact  on  the  care  of  these  patients.  The  specific  aims  of 
our  project  are:  1)  To  identify  the  genes,  gene  expression  profiles  and  molecular  pathways 
associated  with  metastatic  BC  using  microarray  based,  gene  expression  analysis  and  comparison 
of  concurrent  primary  and  metastatic  tumors  within  the  same  patients.  2)  To  identify  gene 
expression  differences  associated  with  clinical  outcome  by  comparison  of  comprehensive 
expression  profiles  from  stage  and  histology  matched  primary  BCs  in  patients  with  long  term 
recurrence-free  survival  and  patients  that  die  of  metastatic  disease. 

BODY 

We  have  completed  all  tasks  originally  proposed.  Specifically  we  have  identified  and 
processed  all  tissue  samples  planned  for  specific  aims  1  and  2.  RNA  has  been  isolated  and 
labeled  cRNA  target  from  these  samples  has  been  subjected  to  gene  expression  analysis  using 
oligonucleotides  microarrays  with  features  for  over  33000  genes/ESTs.  Hierarchical  clustering 
of  the  gene  expression  data  showed  that  most  samples  grouped  according  to  estrogen  receptor 
status  (ER).  In  addition,  the  matched  primary  carcinomas  and  lymph  node  metastases  have  global 
expression  profiles  more  similar  to  each  other  than  to  other  breast  cancers.  Both  unsupervised 
and  supervised  analyses  were  used  to  identify  genes  differentially  expressed  among  samples  and 
molecular  subclasses  of  breast  cancers.  We  identified  a  unique  subclass  of  ER-  breast  carcinoma 
and  characterized  the  molecular  phenotype  (Doane  et  al.  appendix).  In  addition  formal  statistical 
testing  was  used  to  identify  genes  with  marked  changes  in  expression  during  progression.  Lymph 
node  metastases  in  particular  showed  significant  decreases  in  the  expression  of  many  genes 
corresponding  to  extracellular  matrix  proteins  and  proteases  when  compared  to  matched 
primaries.  Further  expression  changes  in  a  variety  of  genes  were  associated  with  distant 
metastases.  Immunohistochemistry  and  in  situ  hybridization  were  used  to  validate  and  extend 
findings.  A  variety  of  invitro  and  in  vivo  models  have  been  used  to  elucidate  specific  molecular 
correlations  (Dechow,  et  al.  Bhargava  et.  al.,  Minn  et  al.,  Kang  et  al.  appendix). 

KEY  RESEARCH  ACCOMPLISHMENTS 

1)  Evaluation  and  selection  of  tumor  cases  to  be  used  for  specific  aims  1  and  2 

2)  Microdissection  of  frozen  tissue,  RNA  preparation  and  analysis  of  all  samples. 

3)  Microarray  based  gene  expression  analysis  of  all  samples. 
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4)  Analysis  of  data  from  specific  aims  1  and  2  and  identification  of  differentially  expressed 

genes. 

5)  Validation  of  differential  expression  at  the  RNA  and  protein  level  for  select  genes. 

6)  Identification  of  genes  that  participate  in  distinct  organ-specific  metastasis 

7)  Identification  of  a  unique  estrogen  receptor-negative  breast  cancer  subset  characterized 

by  a  hormonally  regulated  transcriptional  program  and  proliferative  response  to 
androgen. 
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and  proliferative  response  to  androgen.  Submitted 

Donaton  M,  Giri  D,  Olshen  A,  Panageas  K,  Levcovici  S,  Lai  P,  Brogi  E,  Hudis  C,  VanZee  K, 
Tan  L,  Gerald  W  Comprehensive  gene  expression  analysis  of paired  primary  breast  carcinomas 
and  lymph  node  metastases.  Abstract  presentation  American  Association  of  Cancer  Research, 
2003. 
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carcinomas.  Abstract  presentation  United  States  and  Canadian  Academy  of  Pathology,  2003. 

Lai  P,  Donaton  M,  Giri  D,  Chen  B,  Gerald  W  Molecular  Diagnosis  of  Breast  Cancer 
Therapeutic  Biomarkers  Using  Oligonucleotide  Arrays  Abstract  presentation  USCAP  2005. 
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Doane  A,  Danso  M,  Lai  P,  Donaton  M,  Zhang  L,  and  Gerald  W.  Estrogen  Receptor-Negative 
Breast  Cancer  with  an  Active  Hormone  Response  Pathway:  Therapeutic  Implications.  Abstract 
presentation  AACR  2005 

CONCLUSIONS 

Comprehensive  gene  expression  analysis  of  archived  breast  cancer  samples  is  feasible. 
Molecular  subgroups  of  breast  carcinoma  identified  by  gene  expression  analysis  are  strongly 
influenced  by  the  ER  status  of  the  tumor.  The  gene  expression  profiles  of  paired  primary  and 
metastatic  breast  carcinomas  are  remarkably  similar  and  the  differences  observed  appear  to 
reflect  different  microenvironments  and  tissue  specific  responses  to  tumor  growth.  Taken 
together,  these  results  suggest  that  molecular  features  of  breast  carcinomas  metastatic  to  lymph 
nodes  are  largely  present  in  the  primary  tumor  and  might  have  been  acquired  early  in 
tumorigenesis. 

Analysis  of  primary  tumors  from  patients  with  differing  outcomes  demonstrated  a 
relatively  small  number  of  genes  associated  with  progression.  However  several  have  interesting 
functional  attributes  that  could  impact  on  tumor  biology.  Functional  analysis  is  providing  strong 
evidence  that  some  of  these  differentially  expressed  genes  will  provide  clinical  useful  biomarkers 
and  therapeutic  targets. 

REFERENCES 

None 

APPENDICES 

Dechow  TN,  Pedranzini  L,  Leitch  A,  Leslie  K,  Gerald  WL,  Linkov  I,  Bromberg  JF.  Requirement 
of  matrix  metalloproteinase-9  for  the  transformation  of  human  mammary  epithelial  cells  by 
Stat3-C.  Proc  Natl  Acad  Sci  USA.  2004  101(29):  10602-7. 

Bhargava  R,  Gerald  W,  Lai  P,  and  Chen  B.  Epidermal  growth  factor  receptor  (EGFR)  gene 
amplification  in  breast  cancer:  Correlation  with  mRNA  and  protein  expression  and  absence  of 
common  activating  mutations.  Mod  Pathol.  Epub  ahead  of  press  2005. 

Minn  A,  Kang  Y,  Serganova  I,  Gupta  G,  Giri  D,  Doubrovin  M,  Ponomarev  V,  Gerald  W, 
Blasberg  R,  Massague  J.  Distinct  organ-specific  metastasis  potential  of  individual  breast  cancer 
cells  and  primary  tumors.  J  Clin  Invest.  115:  44-55,  2005 

Minn  A,  Gupta  G,  Siegel  P,  Bos  P,  Shu  W,  Giri  D,  Viale  A,  Olshen  A,  Gerald  W,  Massague  J. 
Genes  that  predict  and  mediate  breast  cancer  metastasis  to  the  lung.  In  press  Nature. 

Yibin  Kang',  Wei  He,  Gaorav  P.  Gupta,  Shaun  Tulley,  Inna  Serganova,  Chang-Rung  Chen’,  Katia 
Manova-Todorova,  Ronald  Blasberg,  William  L.  Gerald  and  Joan  Massague  The  Smad4  Tumor 
Suppressor  Mediates  Pro-Metastatic  TGFp  Gene  Responses  in  Breast  Cancer  Bone  Metastasis. 
Submitted 


6 


Doane  A,  Danso  M,  Lai  P,  DonatonM,  Zhang  L,  Hudis  C,  and  Gerald  W.  An  estrogen  receptor¬ 
negative  breast  cancer  subset  characterized  by  a  hormonally  regulated  transcriptional  program 
and  proliferative  response  to  androgen.  Submitted 


7 


Requirement  of  matrix  metalloproteinase-9  for  the 
transformation  of  human  mammary  epithelial  cells 
by  Stat3-C 

Tobias  N.  Dechow**,  Laura  Pedranzini*,  Andrea  Leitcht,  Kenneth  Lesliet,  William  L.  Gerald*,  Irina  Linkov*, 
and  Jacqueline  F.  Bromberg*5 

‘Laboratory  of  Molecular  Cell  Biology,  The  Rockefeller  University,  New  York,  NY  10021;  and  Departments  of  ’Medicine  and  ’Pathology, 

Memorial  Sloan-Kettering  Cancer  Center,  1275  York  Avenue,  New  York,  NY  10021 

Communicated  by  James  E.  Darnell,  Jr.,  The  Rockefeller  University,  New  York,  NY,  June  9,  2004  (received  for  review  December  2,  2003) 


Persistently  activated  Stat3  is  found  in  many  different  cancers, 
including  “60%  of  breast  tumors.  Here,  we  demonstrate  that  a 
constitutively  activated  Stat3  transforms  immortalized  human 
mammary  epithelial  cells  and  that  this  oncogenic  event  requires 
the  activity  of  matrix  metalloproteinase-9  (MMP-9).  By  immuno- 
histochemical  analysis,  we  observe  a  positive  correlation  be¬ 
tween  strong  MMP-9  expression  and  tyrosine  phosphorylated 
Stat3  in  primary  breast  cancer  specimens.  These  results  demon¬ 
strate  a  relationship  between  activated  Stat3  and  MMP-9  in  breast 
oncogenesis. 


Signal  transducer  and  activator  of  transcription  (STAT)  pro¬ 
teins  are  a  family  of  transcription  factors  that  are  normally 
inactive  within  the  cytoplasm  of  cells  and  become  activated  by 
tyrosine  phosphorylation  in  response  to  cytokines  and  growth 
factors.  Dimerization  through  reciprocal  SH2-phospho-tyrosine 
interactions  of  tyrosine-phosphorylated  STATs  leads  to  their 
accumulation  in  the  nucleus  where  they  bind  DNA  and  activate 
transcription.  STAT  dimers  are  dephosphorylated  within  the 
nucleus  and  transported  back  to  the  cytoplasm  (1).  In  normal 
cells,  STAT  activation  is  transient  whereas,  in  a  large  number  of 
primary  tumors  and  cancer-derived  cell  lines,  STAT  proteins  (in 
particular  Stat3)  remain  activated  by  persistently  activated  ty¬ 
rosine  kinases  and/or  a  decrease  in  the  negative  regulators  of 
STAT  dephosphorylation  (2).  Introduction  of  dominant  nega¬ 
tive  Stat3  or  Stat3  antisense  oligonucelotides  leads  to  induction 
of  apoptosis,  decreased  angiogenesis,  or  growth  arrest  of  cancer- 
derived  cell  lines,  including  breast  cancer  cells  (2, 3).  In  addition, 
a  constitutively  active  mutant  form  of  Stat3,  Stat3-C,  which  is 
dimerized  by  cysteine-cysteine  residues  instead  of  pY-SH2  in¬ 
teractions,  can  transform  immortalized  cultured  rodent  fibro¬ 
blasts  (4).  Stat3  is  persistently  tyrosine  phosphorylated  (by 
immunohistochemical  and  biochemical  analyses)  in  30-60%  of 
primary  breast  cancer  specimens  (3,  5-7),  leading  us  to  test 
whether  Stat3-C  could  mediate  transformation  of  immortalized 
human  mammary  epithelial  cell  lines  (HMECs),  possibly  more 
relevant  to  human  tumor  biology. 

We  report  here  that  Stat3-C  can  transform  immortalized 
HMECs  and  have  determined  that  matrix  metalloproteinase-9 
(MMP-9)  activity  is  increased  in  the  Stat3-C-containing  cell  lines 
and  that  this  activity  is  required  for  Stat3-C-mediated  anchor¬ 
age-independent  growth. 

Experimental  Procedures 

Cells  and  Growth  Conditions.  MCF-10A  cells  were  obtained  from 
the  American  Type  Culture  Collection  (ATCC).  Immortalized 
HMECs  (referred  to  as  HMLHT)  and  HravV12-transformed 
HMLHT  cells  were  obtained  from  R.  A.  Weinberg  (Massachu¬ 
setts  Institute  of  Technology,  Boston)  (8).  Stat3-C  and  v-src- 
expressing  cells  were  generated  by  retroviral  infection  as  de¬ 
scribed  (9).  Puromycin  (2  jxg/ml)  was  added  for  selection.  Cell 
proliferation  was  determined  after  7  days  by  using  alamarBlue 
(BioSource  International,  Camarillo,  CA). 
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Plasmids  and  Reagents.  pBabe-Stat3-C  was  generated  by  inserting 
a  BamWl  site  3'  of  Stat3-C  RcCMV  (4)  and  subcloning  the 
BamHl  cDNA  insert  into  pBabe-Puro  (10).  PBabe-vsrcwas  from 
H.  Hanafusa  (Osaka  Bioscience  Institute,  Osaka,  Japan).  The 
MMP-9  promoter  luciferase  pGL2  construct  was  obtained  from 
M.  Seiki  (Kanazawa  University,  Kanazawa,  Japan)  (11).  The 
MMP-2/9  inhibitor  II  (Calbiochem)  was  resuspended  in  DMSO 
(50  jliM)  and  subsequently  diluted  in  PBS  for  further  use. 
Recombinant  MMP-9  was  obtained  from  R  &  D  Systems. 

Soft  Agar  Assays.  Soft  agar  assays  were  performed  as  described 
(8).  HMLHT  cells  (2  X  104)  and  MCF-10A  cells  (2  X  105)  were 
seeded  per  six-well  in  triplicate  in  3  ml  of  top-agar.  Colonies  were 
stained  with  3-(4,5-dimethylthiazol-2-yl)-2, 5-diphenyl  tetrazo- 
lium  bromide  (MTT)  (Sigma.) 

S.C.  Tumorigenicity  Assays.  Six-  to  8-week-old  immunocompro¬ 
mised  nonobese  diabetic  (NOD)/severe  combined  immunode- 
ficient  (SCID)  mice  (Taconic)  were  y-irradiated  with  300  rad,  4  h 
before  injection  to  suppress  natural  killer  cell  activity.  Cells  (5  X 
106)  were  harvested,  mixed  with  an  equal  volume  of  Matrigel 
(Becton  Dickinson),  and  injected  in  the  mouse  flank.  Tumor  size 
was  measured  once  a  week.  Mice  were  killed  after  10  weeks  of 
observation  or  after  the  tumor  grew  to  “600  mm3.  Nuclear 
extracts  were  isolated  from  the  tumors  and  analyzed  for  the 
presence  of  Stat3-C  by  anti-Flag  Western  blots. 

Gene  Array  Analysis.  For  gene  array  analysis,  see  Supporting 
Materials  and  Methods,  which  is  published  as  supporting  infor¬ 
mation  on  the  PNAS  web  site. 

RT-PCR  for  MMP-9.  RT-PCR  for  MMP-9  was  performed  by 
preparation  of  total  RNA  with  RNeasy  (Qiagen,  Valencia, 
CA)  followed  by  RT  (Clontech).  PCR  reactions  were  per¬ 
formed  by  using  MMP-9  primers  (5 '-primer  GATGCGTG- 
GAGAGTCGAAAT;  3'-primer  CACCAAACTGGATGAC- 
GATG).  GAPDH  primers  were  used  for  loading  control  as 
described  (12). 

Western  Blots,  Immunoprecipitation,  Zymography,  Electrophoretic 
Mobility-Shift  Assay  (EMSA),  Luciferase,  MMP-9  Activity  ELISA,  and 
Immunocytochemistry.  Cytoplasmic  and  nuclear  extracts  were 
prepared  as  described  (4).  Anti-Flag  antibody  (M2,  Sigma)  was 
diluted  1:1,000.  MMP-9  antibody  (Ab-2,  Oncogene  Research 
Products)  was  used  for  immunoprecipitations  (1:20)  and  West¬ 
ern  blots  (1:1,000).  Zymograms  were  performed  as  described 

Abbreviations:  STAT,  signal  transducer  and  activator  of  transcription;  MMP-9,  matrix 
metalloproteinase-9;  HMEC,  human  mammary  epithelial  cells;  EMSA,  electrophoretic 
mobility-shift  assay;  APMA,  4-aminophenylmercuric  acetate. 
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(13,  14).  EMSA  was  carried  out  as  described  by  using  a  high- 
affinity  m67  binding  probe  (4).  HMLHT  cells  (2  X  104/24-well 
dish)  were  transiently  transfected  with  0.4  p, g  of  MMP-9  Lucif- 
erase  construct  and  0.4  pg  of  either  pBabe  or  pBabe  Stat3-C,  by 
using  Lipofectamine  2000  (GIBCO/BRL).  Luciferase  activity 
(Promega)  was  measured  24  h  later.  MMP-9  activity  ELISA 
(Amersham  Pharmacia)  was  conducted  according  to  the  man¬ 
ufacturer’s  instructions.  In  situ  zymography  was  performed  as 
described  (15).  HMLHT  cells  grown  on  multichamber  slides 
were  overlayed  with  DQ  gelatin  (100  pg/ml)  for  2  h  at  37°C, 
washed,  stained  with  4',6-diamidino-2-phenylindole  (DAPI), 
fixed,  and  analyzed  by  confocal  laser  microscopy.  Immunocyto- 
chemistry  was  performed  by  fixing  cells  in  50:50  acetone:metha- 
nol  and  permeabilized  with  0.1%  Triton  X-100.  MMP-9  Ab-1 
(Oncogene  Research  Products)  was  added  overnight  at  4°C 
(1:20). 

Immunohistochemistry.  Multitissue  blocks  of  formalin-fixed,  par¬ 
affin-embedded  breast  cancer  tissue  (containing  four  represen¬ 
tative  0.6-mm  cores)  were  prepared  by  using  a  tissue  arrayer,  and 
immunohistochemistry  was  performed  as  described  (5).  Antigen 
retrieval  using  citric  acid  (pH  6.0)  at  97°C  for  30  min  was 
followed  by  treatment  with  3%  H202.  Phospho-Stat3  (Tyr-705) 
antibody  (Cell  Signaling  Technology,  Beverly,  MA)  was  used  at 
1:200  dilution.  The  phospho-peptide  used  for  generating  the 
antibody  was  used  to  confirm  specificity  of  antibody  binding. 
MMP-9  antibody  (NCL-MMP9,  NovoCastra,  Newcastle,  U.K.) 
was  used  at  1:50  dilution.  Scoring  of  the  tissue  microarray  was 
performed  by  two  independent  observers  (J.F.B.  and  T.N.D) 
with  a  high  correlation  between  scorers  (P  <  0.001)  for  both 
pStat3  and  MMP-9.  In  order  for  a  tumor  to  be  considered 
positive  for  either  pStat3  or  MMP-9,  all  four  replicates  in  the 
tissue  array  had  to  have  a  similar  staining  intensity;  otherwise  it 
was  excluded.  Statistical  analyses  were  done  by  using  statview 
(SAS  Inst.,  Cary,  NC).  The  correlation  between  the  scores  of 
both  scorers  and  the  relationship  between  that  of  pStat3  and 
MMP-9  were  measured  by  using  the  x2  test. 

Results 

Stat3-C  Transforms  HMEC  Cell  Lines.  Given  the  incidence  of  phos- 
phorylated  Stat3  in  primary  breast  cancer  specimens,  we  wished 
to  determine  whether  the  introduction  of  a  constitutively  acti¬ 
vated  version  of  Stat3  (Stat3-C)  was  sufficient  for  mediating 
transformation  of  HMECs.  For  these  studies,  we  used  two 
different  immortalized  nontransformed  HMEC  lines.  HMECs 
from  reduction  mammoplasties  were  immortalized  by  introduc¬ 
ing  both  SV40  large-T  antigen  and  the  telomerase  catalytic 
subunit  (8).  MCF-lOAs  are  a  spontaneously  immortalized  hu¬ 
man  breast  epithelial  cell  line  mutant  in  the  cdk  inhibitor  pl6  (9). 
Immortalized  HMECs  (referred  to  as  HMLHT  cells  in  this 
article)  and  MCF-10A  cell  lines  have  many  of  the  characteristics 
of  normal  breast  epithelium  and  do  not  form  tumors  in  nude 
mice  nor  form  colonies  in  soft  agar,  but  undergo  transformation 
upon  the  introduction  of  Ha -ras  (8,  16). 

Flag-tagged  Stat3-C  was  introduced  into  MCF-10A  and  HM¬ 
LHT  cells  by  retroviral  gene  transfer,  and  polyclonal  populations 
were  selected.  Western  blot  analysis  showed  expression  of 
Stat3-C  in  both  MCF-10A  and  HMLHT  cells  (Fig.  L4).  EMSA 
of  extracts  from  Stat3-C-expressing  cells  showed  strong  binding 
to  a  high-affinity  Stat3  binding  site  (m67)  in  contrast  to  extracts 
from  cell  lines  harboring  the  empty  retroviral  vector  (Fig.  LB). 
The  DNA-protein  complex  could  be  supershifted  with  an  anti- 
Flag  antibody  but  not  by  an  anti-Statl  antibody  (data  not  shown). 

A  classical  assay  for  cellular  transformation  is  anchorage- 
independent  growth.  Control  and  Stat3-C-expressing  MCF-10A 
and  HMLHT  cells  were  plated  in  soft  agar,  and  colony  formation 
after  3  weeks  by  Stat3-C-expressing  cell  lines  but  not  control 
lines  was  evident  (Fig.  1C). 
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Fig.  1.  5tat3-C  induces  tumorigenesis  of  HMLHT  and  MCF-10A  cells  in  a 
dose-dependent  manner.  (A)  Anti-Flag  Western  blot  showing  Stat3-C  expres¬ 
sion  in  MCF-10A  and  HMLHT  cells  expressing  pBabe  control  vector  (pB)  and 
pBabe-Stat3-C  (pB-3C).  (B)  EMSA  performed  with  nuclear  extracts  from  cell  f 
lines  described  in  A.  Stat3-C  DNA  binding  was  supershifted  with  anti-Flag  }.; 
antibody,  indicated  with  a  +.  (O  Colony  formation  in  soft  agar  of  empty 
retroviral  control  (pB)  and  Stat3-C  infected  (pB-3C)  MCF-10A  and  HMLHT 
cells  (mean  ±  SD).  (D)  Tumor  growth  in  nonobese  diabetic/severe  combined  : 
immunodeficient  (NOD/SCID)  mice  when  using  the  HMLHT  pBabe  control  cell 
line  and  subclones  with  high  Stat3-C  expression  levels  (no.  16  and  no.  24)  or 
low  Stat3-C  expression  (3CL).  Results  are  expressed  as  the  mean  of  4-10 
tumors  ±  SD  at  the  indicated  times  after  injection.  (£)  Nuclear  extracts  from  a 
Stat3-C-derived  tumor  (tu),  normal  murine  breast  tissue  (nbr),  and  cell  line  no. 

1 6  (+  con)  were  analyzed  for  the  presence  of  Stat3-C  by  Flag  immunoblot. 


To  determine  whether  the  amount  of  Stat3-C  expressed 
influenced  the  efficiency  of  transformation,  single  clones  were 
isolated,  and  DNA-binding  assays  were  carried  out.  Low  (L)  and 
high  (H)  Stat3-C-expressing  clones  were  isolated  and  compared 
with  the  heterogeneous  population  (pB-3C)  (see  Fig.  6A,  which 
is  published  as  supporting  information  on  the  PNAS  web  site). 
Cells  expressing  low  levels  of  Stat3-C  did  not  grow  in  soft-agar, 
whereas  higher  expression  levels  (H)  showed  colony  formation 
suggesting  that  a  threshold  amount  of  Stat3-C  is  required  for 
soft-agar  growth  (see  Fig.  6 B). 

Two  high-expressing  Stat3-C  clones  (no.  24  and  no.  16)  were 
injected  s.c.  into  the  flank  of  irradiated  nonobese  diabetic/severe 
combined  immunodeficient  (NOD/SCID)  mice  and  gave  rise  to 
tumors  in  all  animals  in  contrast  to  cells  bearing  the  empty 
retroviral  vector  or  a  low-expressing  clone  (L)  (Fig.  ID).  The 
presence  of  Stat3-C  within  the  tumor  was  determined  by  anti- 
Flag  Western  blot  analysis  (Fig.  ID).  Thus,  Stat3-C  can  mediate 
transformation  of  immortalized  human  breast  epithelial  cells. 
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This  finding  is  an  extension  of  our  previous  report  that  Stat3-C 
induced  transformation  of  immortalized  murine  fibroblasts  (4). 

Stat3-C  Induced  Gene  Expression.  It  is  logical  that  the  mechanism(s) 
by  which  this  persistently  active  transcription  factor  mediates 
cellular  transformation  is  through  activation  of  specific  genes. 
Wc  next  wished  to  identify  differentially  expressed  mRNAs  in 
Stat3-C-containing  HMLHT  and  MCF-10A  cells.  By  RT-PCR 
analysis  of  mRNA,  Cyclin  Dl,  Bcl-xL,  myc,  and  vascular  endo¬ 
thelial  growth  factor  (VEGF),  known  target  genes  of  activated 
Stat3  in  fibroblasts,  were  not  increased  in  the  Stat3-C-expressing 
cell  lines  compared  with  those  bearing  the  empty  retroviral 
vector  (data  not  shown).  Thus,  Affymetrix  Gene  Chip  Analysis 
was  performed  on  RNA  isolated  from  HMLHT-Stat3-C  and 
MCF-10A-Stat3-C  cell  lines  compared  with  their  respective, 
vector-infected  control  cells.  One  hundred  and  forty-one 
mRNAs  were  up-regulated,  and  63  were  down-regulated  in  the 
HMLHT-Stat3-C-expressing  cells  compared  with  HMLHT  cells 
containing  the  empty  retroviral  vector;  and  163  mRNAs  were 
up-regulated  and  36  were  down-regulated  in  the  MCF-10A- 
Stat3-C  cells  compared  with  MCF-10A  cells  bearing  the  empty 
vector  (2-fold,  P  <  0.001).  We  then  determined  those  mRNAs 
that  were  up-  or  down-regulated  in  both  Stat3-C-expressing  cell 
lines.  Twenty-three  mRNAs  were  increased,  and  one  decreased 
in  both  cell  lines  (see  Tables  1-3,  which  are  published  as 
supporting  information  on  the  PNAS  web  site).  Some  transcripts 
were  increased  by  >8-fold  in  at  least  one  of  the  Stat3-C- 
containing  cell  lines.  However,  the  importance  of  these  tran¬ 
scripts  in  tumorigenesis  has  not  been  well  documented.  One  of 
the  mRNAs  up-regulated  in  both  of  the  Stat3-C-expressing  cell 
lines  was  MMP-9  (2.6-  to  4-fold  induction).  Given  the  role  of 
MMP-9  in  tumor  formation,  invasion,  metastasis,  and  angiogen¬ 
esis  (17),  we  focused  our  attention  on  this  gene  as  possibly 
relevant  to  Stat3-C-mcdiated  transformation  in  these  breast 
epithelial  cells. 

MMP-9  Is  Expressed  and  Zymographically  Active  in  Stat3-C-Expressing 
HMEC  Lines.  Relative  levels  of  MMP-9  mRNA  were  determined 
by  RT-PCR  in  MCF-10A  and  HMLHT  cells  and  found  to  be 
increased  in  the  Stat3-C-expressing  cells  compared  with  empty 
retroviral  vector-containing  cells  (Fig.  24).  To  evaluate  possible 
transcriptional  regulation  of  MMP-9  by  Stat3-C,  we  transiently 
transfected  a  luciferase  construct  containing  the  human  MMP-9 
promoter  (with  two  potential  Stat3-binding  sites)  with  either 
empty  vector  or  Stat3-C  into  HMLHT  cells.  Stat3-C  expression 
led  to  a  4-fold  increase  of  MMP-9  promoter-driven  luciferase 
activity  in  HMLHT  cells  (Fig.  2 B).  MMP-9  (gelatinase  B)  is 
secreted  as  a  92-kDa  pro-enzyme  and  cleaved  by  other  proteases 
to  an  activated  84-kD  form.  By  immunoprecipitation  and  West¬ 
ern  blotting,  latent  MMP-9  protein  was  increased  in  the  cell 
culture  medium  from  Stat3-C-expressing  HMLHT  and  MCF- 
10A  cells  compared  with  that  in  the  medium  from  their  respec¬ 
tive  control  cell  lines  (Fig.  1A ,  which  is  published  as  supporting 
information  on  the  PNAS  web  site).  MMP-9  and  MMP-2 
(gelatinase  B  and  A)  are  the  two  major  gelatinases  produced  by 
cells.  An  increase  in  the  latent  92-kDa  MMP-9  was  observed  in 
the  cell  culture  medium  from  Stat3-C-expressing  cells  compared 
with  that  from  control-infected  cells  by  gelatin  zymography  (Fig. 
2 C).  The  latent  form  of  MMP-9  is  active  zymographically  due  to 
the  denaturing  conditions  of  SDS/PAGE,  which  reveals  the 
catalytic  domain  of  MMP-9.  Notably,  gelatin  zymography  did 
not  reveal  any  72-kDa,  MMP-2  activity.  Moreover,  Stat3-C 
protein  levels  in  HMLHT  cells  positively  correlated  with  latent 
MMP-9  expression  as  determined  by  gelatin  zymography  (Fig. 
IB).  Thus,  an  increase  of  only  the  latent  form  of  MMP-9  is 
observed  in  the  cell  culture  medium  of  Stat3-C-expressing 
MCF-10A  and  HMLHT  cells. 
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Fig.  2.  Stat-3C-dependent  induction  of  MMP-9  mRNA,  luciferase  activity, 
and  protein.  (A)  Induction  of  MMP-9  mRNA  in  pBabe-Stat3-C  (pB-3C)-  and 
pBabe  (pB)-infected  MCF-10A  and  HMLHT  cells  determined  by  RT-PCR  (Up¬ 
per),  normalized  to  GAPDH  (Lower).  (8)  HMLHT  cells  were  transfected  with  an 
MMP-9  promoter  luciferase  construct  in  conjunction  with  either  pBabe  (pB)- 
or  pB  Stat3-C  (3-C)-expressing  plasmids.  Luciferase  activities  are  shown  as  the 
mean  ±  SD  of  three  experiments  performed  in  duplicate.  (0  MMP-9  protein 
expression  in  cell  culture  medium  from  pBabe  (pB)-  and  pBabe-Stat3-C  (pB- 
3C)-infected  MCF-10A  and  HMLHT  cells  shown  by  gelatin  zymography. 


Proteolytically  Active  MMP-9  Is  Localized  to  the  Cell  Surface  of 
Stat3-C-Containing  Cells.  A  second  assay  for  MMP-9  activity, 
which  measures  only  cleaved  (84-kD)  protein,  showed  as 
expected  no  extracellular  activity  in  either  control  or  Stat3- 
C-expressing  cells  (Fig.  3A,  black  columns).  In  this  assay,  the 
total  MMP-9  activity  can  be  measured  by  treating  the  samples 
with  4-aminophenylmercuric  acetate  (APMA),  which  results 
in  the  cleavage  of  the  MMP-9  pro-peptide,  revealing  enzymat¬ 
ically  active  MMP-9.  After  APMA  treatment,  an  increase  in 
MMP-9  in  the  medium  from  Stat3-C-expressing  HMLHT  cells 
was  observed  (Fig.  3 A,  gray  columns).  In  contrast,  total 
cell-associated  MMP-9  activity  was  ^8-fold  higher  in  Stat3- 
C-expressing  HMLHT  cells  as  compared  with  vector-infected 
cells  (Fig.  3 B,  black  columns).  Treatment  of  these  extracts  with 
APMA  led  to  only  a  modest  increase  in  activity,  suggesting  that 
much  of  the  cell-associated  MMP-9  is  in  an  enzymatically 
active  form  (Fig.  3 B,  gray  columns).  We  also  examined  gela¬ 
tinase  activity  in  situ  on  cells  grown  in  culture  (Fig.  3C). 
Fluorescein-conjugated  gelatin  (DQ  gelatin)  was  overlayed  on 
cells,  revealing  an  increase  in  fluorescence  in  the  Stat3-C- 
expressing  cells  compared  with  control  cells,  which  is  a  mea¬ 
sure  of  the  proteolytic  activity  of  the  gelatinase  (Fig.  3C 
Upper).  Furthermore,  this  activity  was  reduced  in  the  presence 
of  a  dual  specific  MMP-2/9  enzymatic  inhibitor,  an  N- 
sulfonylamino  acid  derivative  that  chelates  zinc  at  the  active 
site  and  inhibits  MMP-2/9-dependent  invasion,  tumor  growth, 
and  metastasis  in  both  cell  culture  and  mouse  tumor  models 
(18,  19)  (Fig.  3C  Lower).  Given  the  lack  of  MMP-2  expression 
in  the  Stat3-C-containing  HMLHT  cells  as  determined  by 
zymography  (Fig.  2C),  we  felt  that  this  inhibitor  was  ap¬ 
propriate  for  the  assay.  The  cellular  localization  of  MMP-9 
was  examined  by  immunocytochemistry  and  was  found  to 
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Fig.  3.  Active  MMP-9  is  localized  to  the  cell  surface.  ( A  and  8)  An  ELISA 
specificfor  enzymatically  active  MMP-9  was  performed  on  cell  culture  medium 
(A)  and  cell  extracts  (B)  from  pBabe  (pB)-  and  pBabe-5tat3-C  (pB-3C)- 
expressing  cells.  MMP-9  activity  from  cell  culture  medium  and  cell  extracts  was 
measured  without  (black  columns)  and  with  pretreatment  with  APMA  (gray 
columns).  Results  are  shown  as  the  mean  ±  SD  of  three  experiments  per¬ 
formed  in  duplicate.  (C)  In  situ  zymography  of  HMLHT  cells  expressing  pBabe 
and  pBabe-Stat3-C  (pB-3-C)  treated  with  DMSO  {Upper)  or  1.5  pM  MMP-2/9 
inhibitor  {Lower).  The  cells  were  then  overlayed  with  DQ  gelatin.  Green 
staining  indicates  MMP-9-digested  gelatin  whereas  blue  indicates  nuclear 
staining  [4',6-diamidino-2-phenylindole  (DAPI)].  (D)  MMP-9  expression  shown 
by  immunofluorescence  in  the  cell  lines  described  in  C. 


be  predominantly  in  a  membrane-associated  distribution 
(Fig.  3D). 

Inhibition  of  MMP-9  Reduces  Stat3-C-Dependent  Transformation  in 
HMLHT  Cells.  To  determine  whether  the  enzymatic  activity  of 
MMP-9  contributes  to  Stat3-C-induced  anchorage-independent 
growth  of  HMLHT  cells,  a  polyclonal  population  of  Stat3-C- 
expressing  cells  and  a  high  Stat3-C-expressing  clone  (data  not 
shown)  were  grown  in  soft  agar  in  the  presence  of  the  MMP-2/9 
inhibitor.  Colony  formation  was  attenuated  in  the  presence  of 
increasing  concentrations  of  the  MMP-2/9  inhibitor  (Fig.  44). 


Fig.  4.  MMP-9  activity  is  required  for  Stat-3C-dependent  anchorage- 
independent  growth.  (A)  Anchorage-independent  growth  of  pBabe-Stat3-C 
cells  (pB-3C)-,  pBabe  v-src  (pB-src)-,  and  pBabe  H-ras  VI 2  (pB-ras)-expressing 
HMLHT  cells.  DMSO  (D)  control  and  increasing  concentrations  of  MMP-2/9 
inhibitor  in  pM  were  added  to  the  soft  agar  assay  every  other  day  (mean  ±  SD). 
(B)  Gelatin  zymography  of  supernatants  derived  from  HMLHT  cells  expressing 
either  pBabe  (pB),  pBabe-Stat3-C  (3C),  pBabe  v-src  (src),  or  pBabe  H-ras  VI 2 
(ras)  and  0.5  ng  of  recombinant  MMP-9  as  a  loading  control. 


The  MMP-2/9  inhibitor  did  not  influence  the  proliferation  of 
Stat3-C-expressing  HMLHT  cells  grown  in  monolayer  culture 
(see  Fig.  8,  which  is  published  as  supporting  information  on  the 
PNAS  web  site).  Specificity  of  the  MMP-2/9  inhibitor  was 
examined  in  HMLHT  cells  transformed  by  either  v-src  or  | 
H-rasV12.  Colony  formation  of  HMLHT  cells  expressing  v-src, 
an  oncogene  that  activates  Stat3  and  requires  Stat3  for  its 
transforming  capacity  (20,  21),  was  suppressed  by  1.5  pM 
MMP-2/9  inhibitor  (Fig.  4 A).  In  contrast,  H-rasV12-induced  | 
anchorage-independent  growth  of  HMLHT  cells  was  not  af¬ 
fected  by  1.5  pM  inhibitor  (Fig.  44).  Gelatin  zymography 
revealed  high  levels  of  latent  MMP-9  in  the  medium  of  v-src- 
transformed  HMLHT  cells  whereas  the  cell  culture  medium 
from  H-rasV12-expressing  cells  did  not  have  any  detectable 
MMP-9  but  did  contain  increased  MMP-2  levels  (Fig.  43).  These 
results  demonstrate  that  MMP-9  activity  is  required  for  anchor¬ 
age-independent  growth  of  HMLHT  cells  induced  by  Stat3-C 
and  v-src  but  not  by  H-rasV12. 

MMP-9  Expression  Correlates  with  That  of  Activated  Stat3  in  Primary 
Breast  Cancer  Specimens.  Immunohistochemical  analysis  of  mi¬ 
crotissue  arrays  of  primary  human  breast  cancer  specimens  (34 
tumor  specimens  and  8  normal)  shows  that  27%  contain  high 
levels  (+  +  +)  of  nuclear  phospho-Stat3  (pStat3),  30%  contain 
moderate  levels  of  nuclear  pStat3  (++),  and  42%  contain  little 
to  no  pStat3  (0/ +)  (Fig.  5).  Normal  breast  has  little  to  no  pStat3 
(Fig.  5.  Bottom ).  It  has  been  determined  that  MMP-9  is  over¬ 
expressed  in  primary  breast  carcinomas  by  immunohistochem- 
istry  (22-26).  The  cellular  distribution  of  MMP-9  protein  in 
paraffin  sections  is  typically  cytoplasmic  (23-27).  We  stained 
sequential,  serial  sections  of  the  breast  microtissue  arrays  with 
anti-sera  to  MMP-9  and  observed  a  strong  cytoplasmic  and 
perinuclear  staining  in  27%  of  these  tumor  specimens  (+  ++), 
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Fig.  5.  Persistently  phosphorylated  Stat3  correlates  with  MMP-9  expression 
in  primary  breast  cancer  samples.  Immunohistochemistry  was  performed  on 
sequential  sections  of  34  primary  breast  cancer  microtissue  arrays  with  anti- 
phospho-Stat3  (pStat3)  and  anti-MMP-9  antibodies.  A  schematic  overview  of 
the  tissue  arrays  and  a  summary  of  the  immunohistochemistry  results  are 
shown.  Representative  sections  of  strong  staining  are  indicated  as  +  +  +  and 
shaded  in  black,  moderate  staining  as  +  +  and  shaded  in  gray,  and  weak  to  no 
staining  asO/+  and  shaded  in  white.  Normal  breast  had  0/+  staining  for  both 
pStat3  and  MMP-9.  A  positive  correlation  was  observed  between  (+  +  +/++) 
staining  for  pStat3  and  MMP-9  (P  <  0.001)  by  x 2  test. 


moderate  staining  in  32%  (++),  and  no  to  little  staining  in 
38%(0/+)  (Fig.  5).  The  majority  of  the  MMP-9  staining  was 
specific  to  the  epithelial  cells.  However,  the  stromal  cells  sur¬ 
rounding  the  epithelial  cells  were  also  positive  in  two  samples 
(data  not  shown).  Not  all  samples  that  stained  positively  for 
pStat3  were  also  positive  for  MMP-9.  However,  a  statistically 
positive  correlation  was  observed  between  (+  +  +/++)  staining 
for  pStat3  and  MMP-9  (P  <  0.001). 


Discussion 

Breast  carcinogenesisis  is  a  process  dependent  upon  the  loss  of 
tumor  suppressors  and  gain  of  oncogenes.  Our  data  suggest  that 
activated  Stat3  plays  a  role  in  breast  tumorigenesis  in  part 
through  the  actions  of  MMP-9.  Stat3  is  persistently  activated  in 
a  large  fraction  of  primary  breast  cancers  both  by  biochemical 
and  immunohistochemical  analyses  (3,  5-7).  Here,  we  demon¬ 
strate,  by  using  two  immortalized  breast  epithelial  cell  lines  used 
to  define  oncogenes  involved  in  breast  tumorigenesis,  the  suffi¬ 


ciency  of  Stat3-C  in  mediating  transformation.  We  also  deter¬ 
mined  that  a  threshold  amount  of  Stat3-C  is  required  for  growth 
in  soft  agar  and  in  nude  mice. 

Further  characterization  of  Stat3-C-expressing  MCF-10A  and 
HMLHT  cells  did  not  reveal  any  significant  differences  in  growth 
rate,  growth-factor  requirement,  or  resistance  to  proapoptotic 
stimuli  (data  not  shown).  The  mechanism  of  transformation  by 
Stat3-C  is  proposed  to  be  through  the  genes  it  transcriptionally 
regulates.  Some  of  the  known  targets  of  Stat3-C  in  fibroblasts 
were  not  altered  in  the  breast  epithelial  cell  lines.  Transcriptional 
regulation  of  genes  by  activated  Stat3  is  likely  dependent  upon 
the  cellular  context  and  thus  the  mechanism  of  transformation. 
By  Affymetrix  Gene  Chip  analysis  a  short  list  of  transcripts  were 
identified  (and  many  confirmed  by  RT/PCR)  that  were  com¬ 
monly  up-  or  down-regulated  in  the  Stat3-C-transformed  cell 
lines  (data  not  shown).  Some  of  these  transcripts  may  be  involved 
in  Stat3-C-mediated  transformation,  but  we  focused  our  atten¬ 
tion  on  MMP-9. 

By  immunohistochemistry  of  cancer  specimens,  MMPs  and  in 
particular  gelatinases  have  been  found  to  be  up-regulated  in 
almost  every  tumor  entity,  including  breast  cancer  (22-28).  Cell 
culture  and  mouse  experiments  with  mammary  epithelial  cells 
and  cancer  cells  have  revealed  a  crucial  role  for  MMP-9  in  tumor 
growth,  invasion,  metastasis,  and  angiogenesis  (29-32).  Many 
molecules  and  signaling  pathways  have  been  reported  to  be 
involved  in  the  induction  of  MMP-9  in  breast  cancer  cells,  such 
as  heregulin,  estrogen,  epidermal  growth  factor  (EGF),  c-jun, 
NF-kB,  and  mitogen-activated  protein  kinase  (MAPK)  (28, 
33-37).  In  addition  to  tumor-derived  MMP  expression,  it  is 
largely  accepted  that  the  tumor  environment  plays  a  crucial  role 
in  the  activity  of  MMPs  (17).  Nevertheless,  it  has  been  demon¬ 
strated  that  expression  of  MMP-3/Stromelysin-l  is  sufficient  to 
transform  mammary  epithelial  cells  in  culture  as  well  as  in  a 
breast-specific  transgenic  mouse  model,  demonstrating  an  on¬ 
cogenic  potential  of  MMPs  produced  by  epithelial  cells  (39). 

Here,  we  show  that  MMP-9  mRNA  and  protein  can  be  induced 
by  Stat3-C  in  mammary  epithelial  cells.  The  MMP-9  promoter 
contains  multiple  putative  Stat3-binding  sites,  two  of  which  can  be 
considered  as  high-affinity  binding  sites  (11).  However,  a  direct 
association  between  Stat3  and  the  MMP-9  promoter  by  chromatin 
immunoprecipitation  has  not  been  observed  (data  not  shown). 
Nevertheless,  an  MMP-9  promoter  luciferase  construct  (-670)  is 
induced  at  least  4-fold  by  Stat3-C  when  transfected  into  HMLHT 
cells.  We  observed  an  increase  in  the  levels  of  latent  MMP-9  protein 
from  conditioned  media  isolated  from  cells  expressing  Stat3-C. 
Furthermore,  we  demonstrated  that  proteolytically  active  MMP-9 
is  localized  primarily  to  the  cell  surface,  which  is  in  accordance  with 
prior  studies  supporting  a  role  for  cell  surface-associated  MMP-9 
with  respect  to  its  enzymatic  and  biological  activity  (13, 14, 39).  By 
using  a  dual-specific  MMP-2/9  inhibitor,  we  observed  suppression 
of  anchorage-independent  growth  of  Stat3-C  and  \-src  (an  onco¬ 
gene  that  activates  and  requires  Stat3  for  transformation)- 
expressing  cells  but  not  of  H-rasV 1 2-transformed  HMLHT  cells. 
Thus,  this  inhibitor  does  not  decrease  growth  in  soft  agar  nonspe- 
cifically  and  indicates  a  crucial  role  for  MMP-9  in  anchorage- 
independent  growth  by  Stat3-C  and  v-src  in  HMLHT  cells. 

We  have  examined  the  abundance  and  distribution  of  tyrosine- 
phosphorylated  Stat3  in  primary  breast  cancer  samples  and  find 
that  —30%  of  the  invasive  tumors  have  strong  staining  for 
nuclear  tyrosine  phosphorylated  Stat3.  We  did  not  have  access 
to  prognostic  information  with  our  tissue-array  samples  and 
therefore  cannot  say  whether  strong  nuclear  phospho-Stat3  is 
associated  with  indolent  or  aggressive  breast  cancer.  Interest¬ 
ingly,  high  MMP-9  protein  levels  in  sequential  sections  of  the 
tissue  micro  arrays  correlates  with  that  of  activated  Stat3, 
supporting  our  cell  culture  work  that  MMP-9  induced  by  Stat3 
may  contribute  to  mammary  tumorigenesis. 
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and  HER-2  status  and  absence  of 
EGF/7-activating  mutations 
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The  human  epidermal  growth  factor  receptor  (HER)  family  of  receptor  tyrosine  kinase  has  been  extensively 
studied  in  breast  cancer;  however,  systematic  studies  of  EGFR  gene  amplification  and  protein  overexpression 
in  breast  carcinoma  are  lacking.  We  studied  EGFR  gene  amplification  by  chromogenic  in  situ  hybridization 
(CISH)  and  protein  expression  by  immunohistochemistry  in  175  breast  carcinomas,  using  tissue  micro¬ 
arrays.  Tumors  with  >5  EGFR  gene  copies  per  nucleus  were  interpreted  as  positive  for  gene  amplifi¬ 
cation.  Protein  overexpression  was  scored  according  to  standardized  criteria  originally  developed  for  HER-2. 
EGFR  mRNA  levels,  as  measured  by  Affymetrix  U133  Gene  Chip  microarray  hybridization,  were  available  in  63 
of  these  tumors.  HER-2  gene  amplification  by  fluorescence  in  situ  hybridization  (FISH)  and  protein 
overexpression  by  immunohistochemistry  were  also  studied.  EGFR  gene  amplification  (copy  number  range: 
7-18;  median:  12)  was  detected  in  11/175  (6%)  tumors,  and  protein  overexpression  was  found  in  13/175  (7%) 
tumors.  Of  the  11  tumors,  10  (91%)  with  gene  amplification  also  showed  EGFR  protein  overexpression  (2  +  or 
3+  by  immunohistochemistry).  The  EGFR  mRNA  level,  based  on  Affymetrix  U133  chip  hybridization  data, 
was  increased  relative  to  other  breast  cancer  samples  in  three  of  the  five  tumors  showing  gene  amplification. 
Exons  19  and  21  of  EGFR,  the  sites  of  hotspot  mutations  in  lung  adenocarcinomas,  were  screened  in  the  11 
EGF/7-amplified  tumors  but  no  mutations  were  found.  Three  of  these  11  tumors  also  showed  HER-2  over¬ 
expression  and  gene  amplification.  Approximately  6%  of  breast  carcinomas  show  EGFR  amplification  with 
EGFR  protein  overexpression  and  may  be  candidates  for  trials  of  EGFR-targeted  antibodies  or  small  inhibitory 
molecules. 

Modern  Pathology  advance  online  publication,  13  May  2005;  doi:10.1038/modpathol.3800438 

Keywords:  breast  cancer;  EGFR;  gene  amplification;  mRNA  expression;  mutation;  protein  overexpression;  tissue 
microarray 


The  epidermal  growth  factor  receptor  (EGFR,  HER-1, 
c-erbB-l)  is  one  of  the  four  transmembrane  growth 
factor  receptor  proteins  that  share  similarities  in 
structure  and  function.  Together,  this  group  com¬ 
prises  the  human  epidermal  growth  factor  receptor 
(HER)  (c-erbB)  family  of  receptor  tyrosine  kinases. 
The  EGFR  gene  is  located  on  the  short  arm  of 
chromosome  7  and  encodes  a  170  kDa  transmem- 
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brane  protein  consisting  of  an  extracellular  EGF- 
binding  domain,  a  short  transmembrane  region, 
and  an  intracellular  domain  with  ligand-activated 
tyrosine  kinase  activity.1  Two  ligands  can  activate 
EGFR:  epidermal  growth  factor  (EGF)  and  transform¬ 
ing  growth  factor-alpha  (TGF-a).  Ligand  binding  to 
EGFR  results  in  receptor  homo-  or  hetero-dimeriza¬ 
tion  (with  one  of  the  HER  family  of  receptor  tyrosine 
kinases)  followed  by  autophosphorylation  of  the 
tyrosine  kinase  domain.2  Phosphorylated  tyrosine 
residues  serve  as  binding  sites  for  the  recruitment  of 
signal  transducers  and  activators  of  intracellular 
substrates.  The  Ras-Raf  mitogen-activated  protein 
kinase  pathway  and  the  phosphatidyl  inositol  3' 
kinase  and  Akt  pathway  are  the  major  signaling 
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routes  for  the  HER  family,  including  EGFR.3-6 
These  pathways  control  several  important  biologic 
processes,  including  cellular  proliferation,  angio¬ 
genesis  and  inhibition  of  apoptosis.7 

The  interest  in  EGFR  is  further  enhanced  by  the 
availability  and  FDA  approval  of  specific  EGFR 
tyrosine  kinase  inhibitors  (eg,  gefitinib).  Many  of 
these  studies  have  focused  on  lung  cancer,  where 
approximately  10%  of  patients  have  a  rapid  and 
often  dramatic  clinical  response.8-10  These  gefitinib- 
responsive  lung  cancers  have  been  found  to  contain 
somatic  mutations  in  the  tyrosine  kinase  domain  of 
the  EGFR  gene.8-10  The  data  regarding  the  presence 
or  absence  of  EGFR  gene  amplification  in  other 
tumor  types,  and  their  response  to  these  EGFR 
tyrosine  kinase  inhibitors  are  still  limited.  EGFR 
protein  overexpression  has  been  reported  to  occur  in 
16-36%  of  breast  cancers;  however,  systematic 
studies  evaluating  gene  amplification,  mRNA  ex¬ 
pression  and  protein  expression  in  the  same  set  of 
cases  are  lacking.11-13  In  order  to  address  this  issue, 
we  studied  175  breast  cancers  for  the  presence  of 
EGFR  gene  amplification.  In  addition,  we  analyzed 
EGFR  protein  expression,  HER-2  protein  expression 
and  gene  amplification  in  these  tumors.  We  also 
examined  EGFR  transcript  levels  in  a  subset  of  these 
tumors  by  Affymetrix  U133  chip  hybridization  and 
performed  a  mutational  screen  of  the  EGFfl-ampli- 
fied  cases. 

Materials  and  methods 

Case  Selection  and  Tissue  Microarray  Construction 

In  all,  188  randomly  selected  invasive  breast 
carcinomas  were  included  in  this  study.  Tissue 
microarrays  were  created  using  0.6  mm  tissue  cores 
as  previously  described.14-18  An  H&E-stained  sec¬ 
tion  was  evaluated  for  the  presence  of  invasive 
breast  carcinoma  and  the  area  to  be  used  for  creation 
of  the  tissue  microarrays  was  marked  on  the  slide 
and  the  donor  block.  Three  to  four  cores  from 
different  areas  of  the  tumor  were  sampled  for  each 
tumor. 


Histologic  Examination 

Histologic  assessment  of  tumor  type  and  grade  were 
routinely  performed  on  4-5  ptm  thick  H&E  sections 
of  formalin-fixed  paraffin-embedded  tumors.  The 
nuclear  grades  of  invasive  ductal  and  lobular 
carcinomas  were  designated  as  follows:  grade  1, 
small,  regular  uniform  cells;  grade  2,  moderate 
increase  in  size  and  variability;  grade  3,  marked 
variation  in  size  and  shape.  The  architectural  grades 
of  invasive  ductal  carcinomas  were  designated  as 
follows:  grade  1,  well  developed  (>75%)  tubule 
formation;  grade  2,  moderate  (10-75%)  tubule 
formation;  grade  3,  little  or  no  ( <  10%)  tubule 
formation. 


Immunohistochemistry 

Tissue  microarray  sections  (4-5  /im  thick)  were  used 
for  all  immunohistochemical  analyses.  The  Ventana 
CONFIRM™  antiestrogen  receptor  (clone  6F11)  and 
antiprogesterone  receptor  (clone  16)  monoclonal 
antibodies  were  used  for  immunohistochemical 
analyses  of  estrogen  receptor  and  progesterone 
receptor,  respectively,  performed  on  the  Ventana 
automated  slide  stainers  according  to  the  manufac¬ 
turer’s  instructions  (Ventana  Inc.,  Tucson,  AZ,  USA). 
The  estrogen  receptor  or  progesterone  receptor 
results  were  manually  screened  and  were  inter¬ 
preted  as  positive  when  more  than  10%  of  tumor 
cells  showed  positive  nuclear  staining.  HER-2 
immunohistochemistry  was  performed  using  the 
HercepTest™  kit  (DAKO  Corp,  Carpinteria,  CA, 
USA)  and  EGFR  immunohistochemistry  was  per¬ 
formed  using  a  monoclonal  EGFR  antibody  (Clone 
31G7,  Zymed  Laboratories  Inc.,  South  San  Francis¬ 
co,  CA,  USA)  according  to  the  manufacturer’s 
instructions;  both  HER-2  and  EGFR  results  were 
interpreted  manually  as  follows:  0,  no  membrane 
staining;  1  4- ,  faint,  partial  membrane  staining;  2  + , 
weak,  complete  membrane  staining  in  >10%  of 
invasive  cancer  cells;  3  + ,  intense  complete  mem¬ 
brane  staining  in  >10%  of  invasive  cancer  cells. 
The  highest  immunohistochemical  score  obtained 
among  different  cores  of  the  same  tumor  was  used  as 
the  final  immunohistochemical  result  of  that  tumor. 


Chromogenic  In  Situ  Hybridization 

Chromogenic  in  situ  hybridization  (CISH)  for  EGFR 
gene  was  performed  according  to  the  manufacturer’s 
instructions.  Briefly,  the  tissue  microarray  sections 
were  incubated  at  55°C  overnight.  The  slides  were 
deparaffinized  in  xylene  and  graded  ethanols.  Heat 
pretreatment  was  carried  out  in  the  pretreatment 
buffer  (Zymed  Laboratories  Inc.)  at  98-100°C  for 
15  min.  The  tissue  was  digested  with  pepsin  for 
10  min  at  room  temperature.  After  application  of 
Zymed  SpotLight®  digoxigenin  labeled  EGFR  probe 
(Zymed  Laboratories  Inc.),  the  slides  were  cover- 
slipped  and  edges  sealed  with  rubber  cement.  The 
slides  were  heated  at  95°C  for  5  min  followed  by 
overnight  incubation  at  37°C  using  a  moisturized 
chamber.  Posthybridization  wash  was  performed  the 
next  day  and  followed  by  immunodetection  using 
the  CISH™  polymer  detection  kit  (Zymed  Labora¬ 
tories  Inc.).  The  CISH  signals  were  counted  in  at 
least  30  nuclei  with  a  light  microscope  using  a  x  40 
objective.  A  tumor  was  interpreted  as  positive  for 
gene  amplification  when  the  average  number  of  gene 
copies  was  >5  per  nucleus. 


Fluorescence  In  Situ  Hybridization 

Fluorescence  in  situ  hybridization  (FISH)  for  HER-2 
was  performed  using  the  PathVysion  HER-2  probe 
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kit  (Vysis  Inc.  Downers  Grove,  IL,  USA)  as  pre¬ 
viously  described.17  The  signal  enumeration  was 
performed  under  x  1000  magnification.  The  number 
of  chromosome  17  signals,  HER-2  signals,  and 
number  of  tumor  nuclei  scored  were  recorded  for 
each  core.  At  least  30  cells  were  counted  per  tissue 
core.  Tumors  were  interpreted  as  amplified  when 
the  ratio  of  HEf?-2/chromosomel7  signals  was  >2.0. 
The  average  ratio  of  different  cores  from  the  same 
tumor  was  used  as  the  final  score  for  determination 
of  gene  amplification  status  of  that  particular  tumor. 


EGFR  mRNA  Expression 

EGFR  mRNA  levels  were  determined  in  a  subset  of 
cases  using  Affymetrix  human  genome  U133  Gene- 
Chip®  expression  arrays.  RNA  extraction,  RNA 
target  synthesis,  and  target  labeling  were  performed 
as  previously  described.19  Gene  expression  analysis 
was  carried  out  using  the  Affymetrix  U133A  human 
gene  array,  which  has  22  283  features  for  individual 
gene/EST  clusters,  using  instruments  and  protocols 
recommended  by  the  manufacturer.  For  each  gene 
on  every  sample  we  extracted  two  response  mea¬ 
sures,  the  Average  Difference  and  Absolute  Call,  as 
determined  by  the  default  settings  of  Affymetrix 
Microarray  Suite  5.0.  Expression  values  on  each 
array  were  multiplicatively  scaled  to  have  an 
average  expression  of  500  across  the  central  96% 
of  all  genes  on  the  array.  Calculations  of  relative 
EGFR  transcript  levels  were  based  on  data  from 
Affymetrix  probe  set  201984_s_at. 


EGFR  Mutation  Analysis 

Selected  cases  were  analyzed  for  the  presence  of 
hotspot  mutations  in  exon  19  (short  in-frame 
deletions)  and  exon  21  (L858R  mutation)  that 
together  account  for  approximately  90%  of  EGFR 
mutations  detected  in  lung  cancers.8-10  Exon  19 
deletions  were  studied  by  length  analysis  of  fluor- 
escently  labeled  polymerase  chain  reaction  (PCR) 
products  on  a  capillary  electrophoresis  device,  and 
the  exon  21  L585R  mutation  was  detected  by  PCR 
followed  by  Sau96I  restriction  enzyme  digestion, 
based  on  a  new  Sau96I  site  created  by  the  L585R 
mutation  (2819T>G),  followed  by  capillary  electro¬ 
phoresis  of  the  Sau96I-digested  fluorescently  la¬ 
beled  PCR  products.  These  sensitive  assays  can 
detect  mutations  in  the  presence  of  up  to  90% 
non-neoplastic  cells  and  are  described  in  detail 
elsewhere.20 


Results 

We  obtained  both  CISH  and  immunohistochemistry 
EGFR  data  on  175  of  the  188  breast  cancers.  Nine 
tumors  failed  both  CISH  and  immunohistochemis¬ 
try,  four  additional  tumors  failed  immunohisto¬ 


chemistry  alone.  The  reasons  for  failure  were  a 
complete  loss  of  tissue  cores  from  the  tissue 
microarrays,  less  than  30  tumor  cells  available  for 
scoring,  and  absence  of  hybridization  signals.  The 
absence  of  signals  probably  resulted  from  under-  or 
over-digestion  since  tissue  digestion  for  a  particular 
tumor  cannot  be  adjusted  on  a  tissue  microarray. 

EGFR  gene  copy  number  ranged  from  2  to  18  in 
the  samples  studied.  Copy  number  greater  than  5 
was  considered  amplified  and  identified  in  11/175 
(6%)  tumors  (Table  1).  The  gene  copy  number  in 
amplified  tumors  ranged  from  7  to  18  (mean:  12.1; 
median:  12)  and  in  nonamplified  tumors  ranged 
from  2  to  5  (mean:  2.4;  median:  2)  (Figure  1). 
Affymetrix  U133A  data  on  mRNA  levels  for  EGFR 
were  available  in  five  of  the  amplified  cases.  Three 
of  these  (Table  2)  showed  increased  EGFR  mRNA 
levels  greater  than  two-fold  of  the  average  EGFR 
mRNA  level  in  EGFR-nonamplified  tumors,  and  the 
remaining  two  tumors  showed  no  significant  in¬ 
crease  above  the  average  EGFR  mRNA  level.  The 
mRNA  data  were  not  available  in  the  other  six 
EGFR-amplified  tumors.  No  statistically  significant 
correlation  between  gene  copy  number  and  level  of 
EGFR  transcript  was  found  in  this  small  number  of 
amplified  cases.  Of  the  164  tumors  without  EGFR 
gene  amplification,  mRNA  data  were  available  in  56 
tumors.  All  but  one  tumor  showed  normal  mRNA 
levels.  The  discordant  case  showed  a  7.4-fold 
increase  in  mRNA  level  (data  not  shown). 

By  immunohistochemistry,  the  majority  of  breast 
carcinomas  demonstrated  0-1+  immunoreactivity 
(162/175,  94%).  Eight  of  the  11  breast  carcinomas 
with  amplified  EGFR  showed  3+  immunoreactiv¬ 
ity,  two  tumors  demonstrated  2+  and  one  tumor 
was  scored  as  1  +  (Table  1).  There  was  a  strong 
correlation  between  3  +  immunoreactivity  and  gene 
amplification  [P<  0.0001,  Fisher’s  exact  test).  Three 
of  the  164  nonamplified  tumors  demonstrated  EGFR 
protein  overexpression.  Two  of  these  three  tumors 
were  poorly  differentiated  invasive  ductal  carcino¬ 
mas  and  were  2+  by  immunohistochemistry,  the 
third  tumor  was  an  invasive  pleomorphic  lobular 
carcinoma  and  showed  immunoreactivity  of  3  +  for 
EGFR  without  gene  amplification. 


Table  1  Correlation  of  EGFR  gene  amplification  and  protein 
expression 


Imm  unohistochemistry 

Gene 

amplification 

No  gene 
amplification 

Total 

0 

0 

151 

151 

1+ 

1  (9%) 

10 

11 

2+ 

2  (50%) 

2 

4 

3+ 

8  (89%*) 

1 

9 

Total 

11  (6%) 

164 

175 

*P<  0.0001  (Fisher’s  exact  test  for  EGFR  immunohistochemistry  0-2+ 
and  3+  vs  amplification  status). 
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Figure  1  EGFR  protein  expression  by  immunohistochemistry  and  gene  amplification  by  CISH.  (a)  0  by  immunohistochemistry,  (b)  1  +  by 
immunohistochemistry,  (c)  2+  by  immunohistochemistry,  (d)  3+  by  immunohistochemistry,  (e)  gene  amplification  (10-12  gene  copies 
per  nucleus)  by  CISH,  (f)  no  gene  amplification  (2-3  gene  copies  per  nucleus)  by  CISH. 


Specific  assays  for  the  most  frequent  EGFR 
mutations  in  lung  adenocarcinomas,  exon  19  in¬ 
frame  deletions  and  the  exon  21  L858R  point 
mutation,  were  used  to  analyze  all  FGFfl-amplified 
tumors,  and  the  one  tumor  with  3  +  EGFR  immu¬ 


nohistochemistry  without  EGFR  gene  amplification. 
None  of  the  tumors  showed  either  of  these  hotspot 
mutations  in  the  EGFR  gene  (Table  2). 

We  evaluated  the  clinical  and  pathologic  features 
of  FGFR-amplified  breast  cancers  in  an  effort  to 
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determine  clinically  relevant  associations  (Table  3). 
In  all,  10  of  these  11  tumors  were  poorly  differ¬ 
entiated  high-grade  invasive  ductal  carcinoma,  and 
one  was  a  spindle  cell  metaplastic  carcinoma  with 
focal  squamous  differentiation,  All  of  them  were 
negative  for  estrogen  receptor  and  progesterone 
receptor,  but  three  of  them  were  positive  for  HER-2 
(Table  3).  EGFR  amplification  appears  to  be  inver¬ 
sely  correlated  with  estrogen  receptor  expression. 
There  was  no  correlation  between  EGFR  amplifi¬ 
cation  and  HER-2  amplification.  Three  of  the  11 
patients  developed  distant  metastases  at  40,  42,  and 
48  months,  respectively,  after  the  initial  diagnoses 
(Table  3).  The  first  two  patients  (No.  6  and  7)  died  of 
disease  at  84  and  55  months,  respectively,  and  the 
third  patient  No.  9)  is  alive  with  lung  and  bone 


Table  2  Detailed  data  on  EGFR  protein  expression  by  immuno- 
histochemistry,  mRNA  level,  gene  copy  number  by  CISH,  and 
mutation  status  in  tumors  with  EGFR  amplification  (n  =  ll) 


Cose 

no. 

CIS  hi"  Immunohistochemical  mRNAh 
scores 

Hotspot 

mutationsc 

i 

7 

1  + 

NA 

NF 

2 

7 

2+ 

NA 

NF 

3 

8 

3+ 

NA 

NF 

4 

10 

3+ 

NA 

NF 

5 

11 

3+ 

NA 

NF 

6 

12 

3+ 

34 

NF 

7 

15 

2+ 

5.3 

NF 

8 

15 

3+ 

NA 

NF 

9 

15 

3+ 

<2 

NF 

10 

15 

3+ 

<2 

NF 

11 

18 

3+ 

41 

NF 

“Data  represent  EGFR  gene  copy  number  per  nucleus. 
bData  represent  fold  increase  above  average  mRNA  level  of  EGFR- 
nonamplified  tumors  derived  from  Affymetrix  U133A  chip  hybridiza¬ 
tions.  Calculations  of  relative  EGFR  transcript  levels  were  based  on 
data  from  Affymetrix  probe  set  201984_s_at. 

cMutations  in  EGFR  exon  19  (short  in-frame  deletions)  and  exon  21 
(L858R  mutation). 

CISH:  chromogenic  in  situ  hybridization;  NA:  not  available;  NF;  not 
found. 


metastases  at  89  months.  One  other  patient  (No.  8) 
died  of  unrelated  causes  at  34  months.  The  mean 
follow-up  of  the  11  patients  is  73  months.  Owing  to 
the  limited  number  of  informative  cases,  we  were 
unable  to  determine  whether  EGFR  amplification 
and/or  EGFR  overexpression  is  an  independent 
prognostic  indicator. 


Discussion 

Although  the  EGFR  gene  was  identified  more  than 
two  decades  ago,21  clinical  interest  in  the  gene  has 
recently  been  heightened  by  the  discovery  of  EGFR 
inhibitors.  In  1996,  Yang  et  aPz  demonstrated  that 
treatment  with  genistein,  an  inhibitor  of  tyrosine 
kinase  activity,  inhibited  EGF-induced  tyrosine 
phosphorylation  and  degradation  of  EGFR  in  HepG2 
cells,  suggesting  that  tyrosine  kinase  activity  is 
required  for  either  the  internalization  or  the  degra¬ 
dation  of  EGF-EGFR  receptor  complexes.  The  use  of 
EGFR  kinase  inhibitors  has  recently  received  FDA 
approval  for  use  in  cancer  therapy. 

In  this  study,  we  used  CISH  to  detect  EGFR  gene 
amplification  in  breast  carcinomas.  Our  data  re¬ 
vealed  that  EGFR  gene  amplification  is  an  infre¬ 
quent  event  in  breast  cancer,  occurring  in  only  6% 
of  tumors.  This  percentage  is  in  the  middle  of  the 
range  reported  by  the  few  previous  studies  that 
have  examined  EGFR  copy  number  in  breast  cancer 
(0.8-14%). 23,24 

EGFR  overexpression  was  seen  in  6%  tumors  in 
our  current  study,  which  correlated  well  with  gene 
amplification.  Most  studies  that  have  reported  a 
higher  percentage  of  EGFR  overexpression  have  not 
evaluated  gene  amplification.11-13  Differences  in  the 
prevalence  of  EGFR  overexpression  reported  by 
different  studies  may  be  due  to  variations  in 
techniques  and  type  of  antibodies  used,  criteria 
for  determining  overexpression  and  interobserver 
variability.  For  example,  Harris  et  ai11  measured 
EGFR  in  221  primary  breast  cancers  by  ligand 


Table  3  Detailed  clinical  and  pathologic  data  in  tumors  with  EGFR  amplification  (n  =  ll) 


Case 

no. 

Age 

(years) 

Stage  Tumor  type 

Architectural 

grade 

Nuclear 

grade 

HER-2 

FISH" 

HER-2 

IHC 

ER 

PR 

Recurrence 

(months) 

Survival 

(months) 

i 

44 

3C 

Ductal 

3 

3 

3.8 

3+ 

_ 

_ 

None 

38  (NED) 

2 

47 

2B 

Ductal 

3 

2 

10.7 

3+ 

- 

- 

None 

141  (NED) 

3 

40 

2B 

Ductal 

3 

3 

NA 

0 

- 

- 

None 

74  (NED) 

4 

41 

3C 

Ductal 

3 

3 

1.0 

0 

- 

- 

None 

40  (NED) 

5 

50 

2B 

Ductal 

3 

3 

NA 

0 

- 

- 

None 

91  (NED) 

6 

58 

2A 

Ductal 

3 

2 

NA 

0 

- 

- 

40 

84  (DOD) 

7 

52 

2B 

Ductal 

3 

3 

1.5 

1+ 

- 

- 

42 

55  (DOD) 

8 

92 

2A 

Ductal 

3 

3 

5.4 

3+ 

_ 

- 

None 

34  (DOC) 

9 

61 

2B 

Metaplastic 

3 

1.0 

0 

- 

- 

48 

89  (AWD) 

10 

64 

2A 

Ductal 

3 

3 

NA 

0 

— 

- 

None 

92  (NED) 

11 

54 

3A 

Ductal 

3 

3 

NA 

1+ 

~ 

“ 

None 

66  (NED) 

aData  represent  ratio  of  HER-2/ chromosome  17  copy  numbers. 

IHC:  immunohistochemistry;  ER:  estrogen  receptor;  PR:  progesterone  receptor;  FISH:  fluorescence  in  situ  hybridization;  NA:  not  available;  NED: 
no  evidence  of  disease;  DOD:  dead  of  disease;  AWD:  alive  with  disease;  DOC:  dead  of  other  causes;  -:  negative. 
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binding  with  125I-labelled  EGF,  and  high-affinity 
sites  were  quantitated.  Tsutsui  et  al 12  used  a  primary 
EGFR  monoclonal  antibody  (Kyokutou  Seiyaku, 
Tokyo,  Japan)  for  assessing  EGFR  expression,  and 
interpreted  overexpression  as  ‘tumors  exhibiting 
definite  staining  of  the  cancer  cells’.  In  our  current 
study,  tumors  with  1  +  staining  intensity  were 
interpreted  as  negative  for  overexpression.  Our 
stringent  criteria  in  defining  EGFR  overexpression 
appeared  to  be  the  major  contributing  factor  to  the 
apparent  low  prevalence  of  EGFR  overexpression 
among  breast  carcinomas  in  this  study. 

We  found  no  correlation  of  EGFR  amplification 
and  HER-2  status.  Of  the  11  tumors  showing  EGFR 
gene  amplification,  three  tumors  (27%)  showed 
HER-2  overexpression.  These  three  tumors  also 
showed  HER-2  gene  amplification.  This  proportion 
of  HER-2  positivity  approximates  the  expected 
percentage  in  breast  cancers  in  general.  The  11 
EGFR-amplified  tumors  were  uniformly  estrogen 
receptor/progesterone  receptor-negative,  consistent 
with  findings  by  other  investigators.23 

There  are  contradictory  reports  in  the  literature  on 
the  prognostic  significance  of  EGFR  overexpression 
and  its  relationship  with  known  prognostic  fac¬ 
tors.25-28  In  the  only  study  that  examined  the 
survival  impact  of  EGFR  gene  amplification,  no 
correlation  was  found.23  The  clinical  significance  of 
EGFR  amplification  and/or  EGFR  overexpression 
could  not  be  independently  evaluated  in  our  current 
study  due  to  the  small  number  of  informative  cases. 

Low-level  amplification  of  EGFR  in  concert  with 
EGFR  mutation  is  present  in  some  lung  adenocarci¬ 
noma  cell  lines29  and  we  (M  Ladanyi,  unpublished 
data)  and  others  have  also  observed  that  many 
clinical  lung  cancer  samples  show  evidence  of  copy 
number  gains  of  the  mutant  allele.30  Based  on  these 
considerations,  it  was  of  interest  to  screen  the 
EGFR-amplified  tumors  in  the  present  study  for 
the  activating  mutations  in  exon  19  and  21  that  are 
commonly  detected  in  lung  cancers.  However,  no 
mutations  were  found. 

EGFR  gene  amplification  generally  results  in 
increased  protein  expression  in  breast  carcinomas. 
Apparent  EGFR  protein  overexpression  without 
gene  amplification  occurred  in  only  2%  of  tumors 
in  this  study,  and  its  mechanism  needs  to  be  further 
investigated.  Overall,  approximately  6%  of  breast 
carcinomas  show  moderate-  to  low-level  EGFR 
amplification  associated  with  genuine  EGFR  protein 
overexpression.  A  small  minority  of  breast  cancers 
could  be  responsive  to  EGFR-targeted  therapy,  and 
this  carefully  selected  subset  of  patients  should  be 
considered  for  clinical  trials  evaluating  EGFR  anti¬ 
bodies  or  small  inhibitory  molecules. 
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We  used  bioluminescence  imaging  to  reveal  patterns  of  metastasis  formation  by  human  breast  cancer  cells 
in  immunodeficient  mice.  Individual  cells  from  a  population  established  in  culture  from  the  pleural  effu¬ 
sion  of  a  breast  cancer  patient  showed  distinct  patterns  of  organ-specific  metastasis.  Single-cell  progenies 
derived  from  this  population  exhibited  markedly  different  abilities  to  metastasize  to  the  bone,  lung,  or  adrenal 
medulla,  which  suggests  that  metastases  to  different  organs  have  different  requirements.  Transcriptomic  pro¬ 
filing  revealed  that  these  different  single-cell  progenies  similarly  express  a  previously  described  “poor-prog¬ 
nosis”  gene  expression  signature.  Unsupervised  classification  using  the  transcriptomic  data  set  supported  the 
hypothesis  that  organ-specific  metastasis  by  breast  cancer  cells  is  controlled  by  metastasis-specific  genes  that 
are  separate  from  a  general  poor-prognosis  gene  expression  signature.  Furthermore,  by  using  a  gene  expres¬ 
sion  signature  associated  with  the  ability  of  these  cells  to  metastasize  to  bone,  we  were  able  to  distinguish  pri¬ 
mary  breast  carcinomas  that  preferentially  metastasized  to  bone  from  those  that  preferentially  metastasized 
elsewhere.  These  results  suggest  that  the  bone-specific  metastatic  phenotypes  and  gene  expression  signature 
identified  in  a  mouse  model  may  be  clinically  relevant. 


Introduction 

Cancer  metastases  are  responsible  for  the  majority  of  cancer-relat¬ 
ed  deaths.  A  widely  held  hypothesis  is  that  cancer  metastasis  arises 
from  rare  cells  in  the  primary  tumor  that  acquire  the  ability  to 
progress  through  sequential  steps  necessary  to  grow  at  a  distant 
site  (1, 2).  Some  of  these  sequential  steps  include  invasion  through 
extracellular  matrix,  intravasation,  survival  in  the  circulation, 
extravasation  into  a  distant  site,  and  progressive  growth  at  that 
site.  Consistent  with  the  multistep  nature,  there  is  experimental 
and  clinical  evidence  to  suggest  that  metastasis  is  an  inefficient 
process  whereby  the  vast  majority  of  circulating  tumor  cells  are 
not  able  to  progressively  grow  at  distant  sites  (3-6).  Related  to  this 
is  the  observation  that  metastatic  cells  exhibit  tissue  tropism,  pre¬ 
ferring  to  grow  in  certain  organs  in  a  way  that  cannot  be  explained 
by  circulatory  patterns  alone.  In  breast  cancer,  for  example,  metas¬ 
tasis  affects  the  bone  and  the  lung,  and  less  frequently  the  liver, 
brain,  and  adrenal  medulla.  Although  the  genetic  basis  of  these 
metastatic  properties  is  poorly  understood,  acquisition  of  the  abil¬ 
ity  to  complete  each  step  involved  in  metastasis  is  thought  to  be 
driven  by  the  accumulation  of  genetic  mutations  that  may  result 
in  a  rare  cell's  acquisition  of  a  full  complement  of  these  mutations 
relatively  late  during  the  evolution  of  the  primary  tumor  (1). 

Recently,  the  development  of  DNA  microarray  technology,  which 
allows  for  genome-wide  transcriptomic  profiling,  has  provided  new 
insight  into  the  genetic  basis  of  metastasis.  Studies  using  primary 
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tumor  material  have  identified  a  gene  expression  signature  for 
breast  cancer  metastasis  consisting  of  a  set  of  70  genes  (7, 8).  The 
presence  of  this  “poor-prognosis”  signature  in  the  primary  tumor 
from  early  stage  breast  cancer  patients  is  highly  prognostic  for  the 
development  of  distant  metastasis  and  overall  survival.  Work  using 
adenocarcinoma  metastases  and  unmatched  primary  tumors  from 
breast  and  other  tumor  types  has  revealed  similar  findings  (9). 

The  fact  that  the  poor-prognosis  signature  from  early-stage  pri¬ 
mary  cancers  can  be  used  to  predict  the  development  of  distant 
metastasis  has  been  interpreted  as  challenging  the  traditional 
model  of  metastasis  because  it  suggests  that  metastatic  cells  may 
result  from  many  of  the  early  oncogenic  events  that  drive  prima¬ 
ry  tumor  growth  rather  than  developing  from  late-arising,  rare 
cells  that  accumulate  genomic  alterations  specific  for  metastasis 
(10).  Other  researchers  have  maintained  the  existence  of  distinct 
metastasis  genes  and  have  argued  that  a  poor-prognosis  signa¬ 
ture  may  result  from  the  aggregate  contribution  of  these  genes  by 
subpopulations  of  cells  that  aberrantly  express  some  but  not  all 
of  the  multiple  genes  required  to  complete  metastasis  (1 1).  Thus, 
the  cell  that  contains  the  full  complement  of  metastasis-enabling 
genes  still  may  be  rare.  Regardless,  the  ability  of  the  poor-progno¬ 
sis  genes  to  directly  mediate  metastasis  remains  unknown. 

Using  in  vivo  selection  of  organ-specific  metastatic  cells  from 
the  human  breast  cancer  cell  line  MDA-MB-23 1,  we  recently  iden¬ 
tified  and  functionally  validated  a  set  of  genes  that  specifically 
mediate  osteolytic  bone  metastasis  in  the  mouse  (12).  Cells  that 
express  these  genes  and  that  are  capable  of  bone  metastasis  pre¬ 
exist  within  the  MDA-MB-23 1  parent  line,  which  as  a  population 
already  carries  the  poor-prognosis  signature.  This  cell  line  was 
originally  established  as  the  total  outgrowth  of  cells  derived  from 
a  pleural  effusion  of  a  patient  who  relapsed  years  after  removal  of 
the  primary  tumor  (13).  In  the  present  study,  we  investigate  the 
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Figure  1 

SCPs  from  MDA-MB-231  cells  have  a  poor-prognosis  gene  expression  signature.  (A)  Microarray  expression  data  of  46  of  the  70  poor-prognosis 
genes  (7)  that  are  present  on  the  Affymetrix  U1 33A  GeneChip  for  the  MCF1 0A  normal  breast  epithelial  cell  line,  parental  MDA-MB-231  cell  line, 
and  various  SCPs  from  MDA-MB-231 .  Each  column  represents  a  gene  (denoted  along  the  bottom)  and  each  row  represents  a  cell  line  (denoted 
along  the  right).  Genes  of  the  poor-prognosis  signature  that  are  expressed  at  higher  levels  in  poor-prognosis  tumors  are  above  the  red  line,  and 
those  that  are  underexpressed  are  above  the  green  line.  Genes  with  low  trust  values  due  to  low  or  absent  expression  are  shaded  in  darker  colors 
(Trust;  wedge).  (B)  Microarray  expression  data  of  primary  human  breast  carcinoma  from  63  patients  treated  at  our  institution  who  had  at  least 
5  years  of  clinical  follow-up  and/or  developed  metastatic  disease.  Hierarchical  clustering  of  the  patients’  data  was  performed  with  the  46  poor- 
prognosis  genes.  Each  column  represents  a  patient  and  each  row,  a  gene.  The  MDA-MB-231  cell  line  was  included  and  is  denoted  by  a  blue 
dot  in  the  dendrogram.  Those  patients  in  the  good-prognosis  versus  the  poor-prognosis  cluster  are  separated  by  the  yellow  line.  (C)  Five-year 
metastasis-free  survival  data  for  the  63  patients  classified  according  to  the  hierarchical  clustering  described  in  B.  The  P  value  shown  in  the  graph 
was  calculated  by  the  /}  test.  (D)  Dendrogram  showing  hierarchical  clustering  of  the  SCPs  and  MCF1 OA  using  the  poor-prognosis  genes.  A  scale 
of  the  distance  metric  used  is  shown  on  the  left. 


relationship  between  this  bone  metastasis  signature,  the  general 
poor-prognosis  signature,  and  the  metastatic  activity  of  individual 
cells  from  the  parental  population  and  of  a  cohort  of  metastatic 
human  primary  tumors. 

Results 

Similar  poor-prognosis  gene  expression  signatures  in  different  single  cell- 
derived  progenies.  The  poor-prognosis  gene  expression  signature  for 
breast  cancer,  which  can  be  used  to  predict  the  development  of  dis¬ 
tant  metastasis,  consists  of  70  genes,  58  of  which  are  upregulated 
and  1 8  of  which  are  downregulated,  and  correlates  closely  wi  th  neg¬ 
ative  estrogen  receptor  status  (7).  Most  tumors  in  the  poor-progno¬ 
sis  group  have  only  a  fraction  (on  average,  approximately  one  third) 
of  the  70  gene  expression  events  that  constitute  the  poor-prognosis 
signature.  Furthermore,  these  gene  expression  events  often  show 
extensive  variation  among  different  tumors  with  a  poor  prognosis. 
We  recently  reported  that  MDA-MB-231  cells,  as  directly  obtained 
from  the  American  Type  Culture  Collection  (ATCC),  also  have  the 


poor-prognosis  signature.  Of  the  70  genes  from  this  signature,  46 
were  present  on  the  Affymetrix  U133A  GeneChip  that  we  used  for 
our  microarray  analysis  (Figure  1  A).  Of  the  58  upregulated  genes  of 
the  poor-prognosis  signature,  36  were  present  on  this  microarray. 
Compared  with  the  MCF10A  cell  line  derived  from  nonmalignant 
human  breast  epithelium,  the  majority  of  these  36  genes  were 
upregulated  in  parental  MDA-MB-231  cells.  Of  the  18  downregu¬ 
lated  genes  from  the  poor-prognosis  signature,  10  were  present  on 
the  U133A  GeneChip.  Consistent  with  downregulation  in  poor- 
prognosis  tumors,  7  of  the  10  had  low  trust  values  due  to  their  low 
or  absent  expression. 

To  further  confirm  that  MDA-MB-23 1  cells  have  a  poor-prognosis 
gene  expression  signature,  we  compared  the  transcriptomic  profile 
of  these  cells  with  that  of  a  cohort  of  primary  breast  carcinomas 
from  patients  treated  at  the  Memorial  Sloan-Kettering  Cancer  Cen¬ 
ter.  All  of  these  patients  had  at  least  5  years  of  clinical  follow-up  or 
had  developed  metastatic  disease.  Hierarchical  clustering  using  the 
poor-prognosis  gene  expression  signature  (7)  separated  these  tumors 
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Figure  2 

Noninvasive  BLI  to  monitor  the  development  of  osteolytic  metastases  from  the  same  mouse.  (A-D)  SCP2,  a  highly  metastatic  clone  from  MDA- 
MB-231  ,  was  transduced  with  the  luciferase-containing  TGL  reporter  gene  and  was  injected  into  the  left  cardiac  ventricle  of  an  immunodeficient 
mouse.  At  the  indicated  times  after  xenografting,  the  bioluminescence  signal  was  captured.  The  intensity  of  the  signal,  measured  as  photon  flux, 
is  shown  as  a  color  scale.  Images  for  days  0, 1 ,  and  8  are  displayed  on  the  same  scale,  while  the  day-35  image  is  shown  on  a  different  scale 
due  to  the  exponential  growth  of  the  metastases.  A  metastasis  to  the  right  hindlimb  is  circled  in  red.  (E)  The  growth  kinetics  of  the  right  hindlimb 
metastasis  outlined  by  the  red  circle  shown  in  B-D  was  quantified  by  measurement  of  photon  flux.  (F-H)  A  bioluminescence  image  (F)  and  a 
skeletal  x-ray  image  (G)  were  obtained  on  day  1 6  after  xenografting.  Images  were  superimposed  (H)  to  demonstrate  registration  of  the  biolumi¬ 
nescence  signals  with  skeletal  anatomy.  (I-N)  A  superimposed  image  from  day  45  (I  and  L)  reveals  extensive  areas  of  osteolytic  destruction  that 
correspond  to  bioluminescence  signals.  Magnification  of  regions  outlined  in  red  shows  involvement  of  the  femur/tibia,  iliac  creast  of  the  pelvis, 
and  the  sacrum  (J  and  K),  in  addition  to  the  vertebrae  (M  and  N).  The  bioluminescence  signal  from  the  region  outlined  in  yellow  on  the  left  lateral 
projection  (L)  does  not  overlap  with  skeletal  structures  and  originates  from  the  adrenal  gland  (Figure  3,  J-M). 


into  two  major  clusters,  one  cluster  corresponding  to  patients  with 
a  poor-prognosis  signature  and  the  other  representing  those  with  a 
“good-prognosis”  signature  (Figure  IB).  Consistent  with  previous 
reports,  patients  in  our  cohort  with  a  poor-prognosis  signature  had 
a  significantly  worse  5-year  metastasis-free  survival  than  those  with 
the  good-prognosis  signature  (Figure  1C).  MDA-MB-231  cells  fall 
squarely  within  this  poor-prognosis  group  (Figure  IB).  Thus,MDA- 
MB-23 1  cells  express  a  typical  poor-prognosis  tumor  profile. 

Among  the  questions  raised  by  these  observations  is  whether  the 
particular  set  of  poor-prognosis  gene  expression  events  presented 
by  a  poor-prognosis  tumor  reflects  the  presence  of  this  particu¬ 
lar  pattern  in  the  majority  of  malignant  cells  of  the  tumor  or  if 
it  reflects  contributions  from  different  cells  in  the  population. 


To  address  this  question  in  the  MDA-MB-231  case,  we  used  vari¬ 
ous  single  cell-derived  progenies  (SCPs)  obtained  from  single-cell 
cloning  and  analyzed  them  for  the  presence  of  a  poor-prognosis 
signature.  Although  there  was  some  variation  among  the  SCPs  in 
the  expression  levels  of  the  genes  that  comprised  the  signature,  the 
SCPs  maintained  a  set  of  poor-prognosis  gene  expression  events 
similar  to  that  found  in  the  ATCC  population  from  which  they 
were  derived  (Figure  1  A).  A  dendrogram  of  the  SCPs  using  the  poor- 
prognosis  gene  set  confirmed  that  the  distance  metric  between  the 
SCPs  was  significantly  less  than  the  distance  metric  between  the 
whole  group  of  SCPs  and  MCF10A  (Figure  ID). 

Flow  cytometry  analysis  of  the  parental  MDA-MB-231  cell  popu¬ 
lation  indicated  that  approximately  10%  of  cells  in  this  population 
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Figure  3 

Verification  of  macroscopic  and  microscopic  metastases  by  fluorescence  histology.  (A-l)  A  pathological  fracture  involving  the  proximal  tibia 
(A-E)  or  vertebrae  (F-l)  is  demonstrated  by  skeletal  x-ray  (A  and  B)  and  an  overlay  of  this  x-ray  with  BLI  (B  and  G)  from  the  same  mouse  as 
that  described  in  Figure  2.  To  confirm  metastases,  we  performed  whole-mount  frozen  sectioning.  Regions  corresponding  to  the  fractured  tibia 
and  vertebra  were  analyzed  by  H&E  staining  (C,  D,  and  H)  or  unstained  sections  were  analyzed  for  GFP  fluorescence  (E  and  I).  (J-M)  A  lateral 
projection  of  a  bioluminescence  image  from  day  45  (J)  corresponding  to  the  same  image  as  that  in  Figure  2L  reveals  a  signal  originating  from 
the  adrenal  gland  (green  arrow),  as  shown  by  H&E  staining  (K).  Magnification  of  the  boxed  region  in  K(L)  and  GFP  fluorescence  (M)  of  the  left 
adrenal  gland  are  shown.  (N-Q)  Inspection  of  organs  in  the  left  upper  abdominal  quadrant  with  areas  of  bioluminescence  signal  (N)  reveals  a 
focus  of  tumor  growth  in  the  pancreas  (O).  Magnification  of  the  boxed  region  in  O  (P)  and  GFP  fluorescence  (Q)  are  shown. 


expressed  CXCR4  (data  not  shown),  a  product  representative  of  the 
bone  metastasis  gene  expression  signature  (12).  A  similar  percentage 
ofSCPs  were  found  to  overexpress  CXCR4  (12).  Thus,  based  on  diese 
criteria  at  least,  our  single-cell  cloning  process  did  not  introduce  bias 
in  the  selection  of  cell  clones  representing  the  parental  population. 

Noninvasive  bioluminescence  imaging  of  metastases.  After  intracardiac 
injection  of  parental  MDA-MB-231  cells  into  immune-deficient 
mice,  approximately  30%  will  develop  osteolytic  bone  metasta¬ 
sis  that  is  evident  by  skeletal  x-ray  imaging  (12).  Subpopulations 
that  are  more  osteolytic  than  the  parental  population  have  been 
obtained  through  a  process  of  in  vivo  selection  for  bone  metastasis 
or  by  isolation  of  SCPs  from  parental  MDA-MB-23 1  cells.  However, 
the  sensitivity  of  skeletal  x-ray  in  detecting  nonosseous  metastasis 
is  poor.  Likewise,  findings  at  necropsy  may  also  fail  to  reveal  small 
and/or  anatomically  inconspicuous  lesions.  Indeed,  at  necropsy, 
MDA-MB-23 1  cells  are  infrequently  found  to  have  metastasized  to 
nonosseous  organs  such  as  the  adrenal  medulla. 


In  order  to  better  characterize  the  overall  metastatic  properties 
of  MDA-MB-231  SCPs  and  their  relationships  to  both  the  poor- 
prognosis  and  the  bone  metastasis  gene  sets,  we  used  luciferase- 
based,  noninvasive  bioluminescence  imaging  (BLI)  and  fluores¬ 
cence  microscopy  using  a  novel  triple-modality  reporter  gene, 
thymidine  kinase,  GFP,  luciferase  (TGL)  (14).  This  artificial  gene 
encodes  a  triple  fusion  protein  with  herpes  simplex  virus  1  thy¬ 
midine  kinase  (HSV1-TK)  fused  to  the  N  terminus  of  enhanced 
green  fluorescent  protein  (eGFP)  and  firefly  luciferase  fused  to  the 
C  terminus  of  eGFP.  When  transduced  into  cells,  HSV1-TK  allows 
for  nuclear  imaging,  eGFP  can  be  utilized  for  fluorescence,  and 
luciferase  allows  for  BLI. 

SCP2  is  a  single  cell-derived  population  of  MDA-MB-231  cells 
that  produces  aggressive  osteolytic  lesions  by  8  weeks  after  left  ven¬ 
tricular  cardiac  injection  into  immunodeficient  mice.  As  a  test  of 
the  sensitivity  and  resolution  of  the  TGL  reporter  gene,  we  trans¬ 
duced  SCP2  with  the  TGL  reporter  and  monitored  the  development 
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Figure  4 

SCPs  exhibit  different  abilities  to  metastasize  to  bone.  (A  and  B)  Each  of  the  SCPs  was  labeled  with  the  TGL  reporter,  and  1  x  105  cells  were 
injected  into  the  left  cardiac  ventricle.  At  the  indicated  days  after  xenografting,  bioluminescence  images  were  acquired.  (A)  Representative  mice 
injected  with  a  representative  set  of  SCPs  are  shown  in  the  supine  position.  The  intensity  of  the  signal  from  days  1 , 4,  and  8  are  on  equivalent 
scales,  while  day  24  and  day  48  are  each  on  separate  scales  due  to  increasing  signal  strength  and  to  avoid  signal  saturation.  (B)  The  normalized 
photon  flux  from  the  dominant  signal  originating  from  the  hindlimbs,  forelimbs,  or  pelvis  of  all  the  SCPs  studied  was  measured  over  the  indicated 
time  course.  SCPs  were  ranked  according  to  their  growth  kinetics  in  either  bone  or  lung.  SCPs  with  a  higher  rank  order  for  bone  are  shown  in 
red,  and  those  with  a  higher  rank  order  for  lung  are  shown  in  green.  The  bottom  three  SCPs  for  both  bone  and  lung  are  classified  as  being  the 
least  metastatic  and  are  shown  in  blue. 


of  osteolytic  metastases.  Shortly  after  the  injection  of  1  x  105  cells 
into  the  left  cardiac  ventricle,  a  diffuse  whole-body  bioluminescence 
signal  was  detected  (Figure  2A).  This  signal  followed  systemic  blood 
flow  patterns,  with  areas  of  strongest  signal  probably  corresponding 
to  organs  receiving  the  highest  percentage  of  cardiac  output;  name¬ 
ly,  kidney,  liver,  and  brain.  At  day  1  after  injection,  much  of  the  dif¬ 
fuse  signal  disappeared;  however,  foci  of  arrested  tumor  cells  could 
be  seen.  These  foci  increased  in  number  and  intensity  through  the 
first  week  (Figure  2,  A-C).  In  particular',  an  increasing  signal  could 
be  detected  in  the  hindlimbs  that  corresponded  to  primary  areas  for 
the  development  of  osteolytic  metastasis  (Figure  2,  B-D,  red  circles). 
This  major  hindlimb  signal  was  quantified  by  measurement  of  pho¬ 
ton  flux  and  demonstrated  logarithmic  growth  (Figure  2E). 

Because  bioluminescence  signals  could  be  correlated  only  with 
surface  anatomy,  we  sought  a  way  to  assign  major  areas  of  biolumi¬ 
nescence  to  anatomical  structures.  At  day  16  after  injection,  we  over¬ 
laid  the  bioluminescence  signal  with  skeletal  x-ray  images  in  order  co 
analyze  the  correlation  between  areas  of  signal  with  skeletal  anato¬ 
my.  The  majority  of  the  signal  overlapped  well  with  bony  structures, 
including  the  distal  femur/proximal  tibia,  bony  pelvis,  scapula,  verte¬ 
bra,  distal  ulna,  and  skull  (Figure  2,  F-H).  Although  inspection  of  the 


x-ray  images  at  day  16  did  not  reveal  evidence  of  osteolytic  destruc¬ 
tion  at  the  sites  of  overlap,  skeletal  x-ray  imaging  of  the  same  animal 
at  day  45  demonstrated  overlapping  areas  with  extensive  osteolytic 
destruction  involving  the  distal  femur/proximal  tibia,  iliac  crest, 
sacrum,  and  vertebral  body  (Figure  2, 1-N).  Thus,  these  data  suggest 
that  BLI  can  be  significantly  more  sensitive  in  detecting  bone  metas¬ 
tasis  than  x-ray  imaging,  as  it  allows  monitoring  of  the  development 
of  bone  metastasis  from  initial  arrest  to  osseous  destruction. 

Verification  of  BLI  by  fluorescence  histology.  In  order  to  examine  the 
regions  of  osteolytic  metastasis  histologically  and  to  search  for 
other,  less  obvious  sites  of  occult  metastases,  we  used  whole-mount 
frozen  sectioning  to  look  for  tumor-derived  GFP  fluorescence  by 
microscopy.  Skeletal  x-ray  and  BLI  identified  a  pathological  fracture 
of  the  tibia  (Figure  3,  A  and  B).  H&E  staining  of  sections  correspond¬ 
ing  to  this  region  revealed  tumor  cells  eroding  through  the  cortex 
of  the  tibia  (Figure  3,  C  and  D),  and  GFP  fluorescence  of  a  serial 
section  confirmed  the  metastasis  (Figure  3E).  Similarly,  a  collapsed 
vertebral  body  was  also  demonstrated  to  be  due  to  growth  of  tumor 
cells  through  the  bone  and  into  the  spinal  canal  (Figure  3,  F-I). 

Not  all  areas  ofbioluminescence  signal  could  be  overlaid  with  skel¬ 
etal  structures.  For  example,  as  shown  on  day  35  after  xenografting, 
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Figure  5 

Differential  ability  among  SCPs  to  metastasize  to  the  adrenal  gland.  (A)  After  intracardiac  injection  of  individual  SCPs,  bioluminescence  images 
were  acquired  and  analyzed  for  signals  originating  from  regions  consistent  with  adrenal  metastasis  (arrows).  Shown  are  representative  mice 
at  7  weeks  after  injection  with  SCPs  that  show  varying  abilities  to  give  rise  to  adrenal  metastasis.  (B)  At  necropsy,  left  and  right  adrenal  glands 
(with  the  kidneys)  were  removed  and  were  imaged  ex  vivo  for  bioluminescence.  Arrows  show  the  locations  of  the  left  and  right  adrenal  glands, 
respectively,  from  a  representative  mouse  with  adrenal  metastasis. 


bioluminescence  signals  on  bohth  sides  laceral  to  the  vertebral  col¬ 
umn  could  be  detected  (Figure  2D).  On  a  lateral  projection,  these 
signals  lay  an  tenor  to  the  vercebrae  (Figures  2L  and  3J).  At  necropsy, 
enlarged  and  necrotic  adrenal  glands  were  noted  (Figure  3,  K  and  L), 
and  fluorescence  microscopy  confirmed  that  this  was  due  to  metas¬ 
tasis  (Figure  3M).  In  addition,  careful  analysis  of whole-mount  frozen 
sections  also  identified  other  nonosseous  sites  of  microscopic  metas- 
tases  corresponding  to  weak  regions  of  bioluminescence  signal.  For 
example,  small  foci  of  signal  were  noted  in  the  upper  left  quadrant 
of  the  abdomen  (Figure  3N).  This  signal  was  confirmed  to  be  due  to 
microscopic  metastasis  involving  the  pancreas  (Figure  3, 0-Q). 

In  total,  these  data  demonstrate  that  the  TGL  reporter  gene 
enables  the  use  of  a  noninvasive  method  for  tracking  metasta- 
ses  from  the  initial  arrest  in  distant  organs  to  the  development 
of  gross  lesions.  The  growth  of  these  lesions  can  be  quantified  by 
measuring  photon  flux  and  confirmed  by  fluorescence  micros¬ 
copy.  The  sensitivity  of  the  system  is  exemplified  by  the  ability  to 
detect  and  confirm  microscopic  metastases  chat  would  otherwise 
be  overlooked  by  routine  necropsy. 

Differential  bone-metastatic  activity  with  a  similar  poor-prognosis  signa¬ 
ture.  Empowered  by  the  sensitivity  of  the  TGL  reporter  system,  we 
sought  to  fully  characterize  the  metastatic  phenotypes  of  the  SCPs. 
To  assess  the  metastatic  activity  that  develops  after  hematogenous 
spread,  we  introduced  each  of  the  SCPs  into  the  arterial  circulation 
of  immunodeficient  mice  by  injection  into  the  left  cardiac  ventricle. 
The  major  site  of  colonization  and  growth  among  the  SCPs  is  the 
bone  (hindlimbs,  ribs,  pelvis/sacrum,  and  skull/mandible)  (Figure 
4A).  However,  the  SCPs  displayed  significant  variation  in  their  abil¬ 
ity  to  grow  in  bone,  even  though  the  various  SCPs  proliferated  in 
culture  at  comparable  rates  (data  not  shown).  The  dominant  sig¬ 
nals  on  the  supine  projections  came  from  the  hindlimbs  and  the 


bones  of  the  skull.  For  presentation  purposes,  the  bioluminescence 
data  from  days  1-8  are  displayed  on  the  same  scale  and  day  24  and 
day  48  are  each  displayed  on  a  different  scale.  Comparisons  within 
these  groups  across  SCPs  demonstrated  that  SCP2  and  SCP46  were 
more  metastatic  to  bone  than  are  SCP3  and  SCP26. 

The  dominant  hindlimb  lesion  from  the  complete  set  of  SCPs 
was  quantified  by  measurement  of  photon  flux,  and  the  kinetics  of 
growth  are  shown  in  Figure  4B.  The  aggressiveness  of  SCP2,  SCP2S, 
SCP28,  and  SCP46  in  forming  bone  metastasis  was  shown  by  a  3-  to 
4-log  growth  of  the  dominant  hindlimb  lesion  over  the  course  of  7 
weeks.  Most  of  these  mice  became  cachectic  and  were  sacrificed.  The 
aggressive  nature  of  these  SCPs  is  consistent  with  their  expression 
of  a  previously  described  bone  metastasis  gene  expression  signature 


Table  1 

Adrenal  metastases  and  SCPs 


Progeny 

Number  of  mice 

Number  of  mice  with 

analyzed 

adrenal  metastases  (%) 

SCP2 

4 

2(50) 

SCP3 

9 

7(78) 

SCP25 

4 

1(25) 

SCP6 

5 

0 

SCP32 

4 

0 

SCP43 

5 

0 

SCP21 

5 

0 

SCP26 

4 

0 

SCP28 

5 

0 

SCP46 

4 

0 

The  presence  of  adrenal  metastasis  was  determined  for  the  entire 
cohort  of  SCPs. 
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Figure  6 

SCPs  demonstrate  different  abilities  to  metastasize  to  the  lung.  (A-C)  Each  of  the  SCPs  was  labeled  with  the  TGL  reporter,  and  2  x  106  cells 
were  injected  into  the  tail  vein.  At  the  indicated  day  after  xenografting,  bioluminescence  images  were  acquired.  (A)  Representative  mice  injected 
with  a  representative  set  of  SCPs  are  shown  in  the  supine  position.  The  intensity  of  the  signal  from  day  0  is  displayed  on  one  scale,  while  that 
of  days  14  and  49  (Day  2:14)  are  on  a  different  scale  due  to  increasing  signal  strength  and  to  avoid  signal  saturation.  (B)  The  normalized  photon 
flux  from  the  lung  of  all  the  SCPs  studied  was  measured  over  the  indicated  time  course.  SCPs  are  color-coded  as  described  in  Figure  4B.  (C) 
The  lungs  of  SCPs  that  show  growth  in  lung  were  analyzed  histologically.  A  lung  section  from  a  representative  SCP  is  shown  stained  for  CD31 ,  a 
marker  for  vascular  endothelial  cells,  and  counterstained  with  eosin.  Asterisks  mark  regions  of  parenchymal  tumor  growth.  The  red  arrow  shows 
a  CD31 -positive  blood  vessel  with  an  associated  perivascular  tumor  growth  pattern. 


(12).  However,  SCP43,  SCP3,  and  SCP32  were  weaker  in  their  meta¬ 
static  growth  to  bone,  while  SCP6,  SCP26,  and  SCP2 1  were  the  most 
weakly  metastatic  to  bone.  This  reduction  in  bone  metastasis  abil¬ 
ity  correlated  with  the  attenuation  in  expression  of  the  bone  metas¬ 
tasis  genes  (see  Figure  7D).  Interestingly,  even  among  the  weakest 
populations,  we  were  able  to  detect  the  presence  of  bone  metastasis. 
For  example,  at  14  weeks  after  xenografting  of  SCP26,  a  dormant 
metastatic  focus  within  the  hindlimbs  was  detectable  in  half  of  the 
mice  (Figure  4A  and  Supplemental  Figure  1;  supplemental  material 
available  online  with  this  article;  doi:10.1172/JCI200522320DSl). 
Thus,  these  data  demonstrate  that  the  bone-metastatic  activity  of 
MDA-MB-231  cells  does  not  correlate  with  the  expression  of  their 
poor-prognosis  signature  but  instead  with  the  expression  of  our  pre¬ 
viously  described  bone  metastasis  gene  set. 

Different  organ  specificity  of  metastasis  by  different  cells  from  the  same 
population.  After  extensive  analysis  of  metastatic  growth  by  BLI, 


whole-mount  fluorescence  microscopy,  and  micro-positron  emis¬ 
sion  tomography  (data  not  shown),  we  found  bone  to  be  the  major 
site  of  tumor  growth  after  arterial  inoculation.  In  general,  growth 
in  other  organs  was  rare,  making  comparable  analysis  unfeasible. 
However,  one  exception  was  metastatic  growth  in  the  adrenal 
gland,  which  occurred  at  an  appreciable  frequency.  We  were  able 
to  detect  adrenal  metastases  in  a  minority  of  the  SCPs  by  looking 
for  dorsally  located  signals  on  either  or  both  sides  of  the  vertebral 
column  that  were  suspicious  for  adrenal  metastases  (Figure  5A). 
These  "suspicious”  signals  were  confirmed  at  necropsy  by  gross 
inspection  and/or  ex  vivo  BLI  (Figure  5B).  Of  the  SCPs,  SCP3  was 
the  most  consistent  in  producing  adrenal  metastasis  (Table  1). 

Due  to  size  restrictions  imposed  by  murine  capillaries,  human 
tumor  cells  are  rarely  able  to  pass  from  the  arterial  to  the  venous 
system  (or  vice  versa)  by  way  of  the  lungs  (2).  Therefore,  we  injected 
the  SCPs  into  the  tail  vein  in  order  to  study  the  ability  of  SCPs  to 
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Figure  7 

Genome-wide  “unsupervised”  classification  of  the  SCPs  correlates  with  metastatic  phenotype.  (A)  A  multidimensional  scaling  plot  illustrates 
the  relationship  between  the  various  SCPs  and  their  primary  metastatic  tropism  based  on  genes  that  are  differentially  expressed  across  the 
SCPs  starting  from  the  more  than  22,000  present  on  the  Affymetrix  U133A  GeneChip.  SCPs  are  color-coded  according  to  their  primary  meta¬ 
static  tropism  (green  for  lung,  red  for  bone,  and  blue  for  weakly  metastatic).  The  plot  demonstrates  that  SCPs  with  the  same  primary  metastatic 
tropism  group  together  in  3-dimensional  space.  Each  group  is  each  enclosed  in  a  circle.  MCF10A  is  shown  by  itself  (gold  dot).  (B)  Hierarchical 
clustering  of  the  SCPs  based  on  genes  differentially  expressed  reveals  similar  relationships  and  a  similar  association  with  metastatic  tropism,  as 
summarized  in  the  table  below  the  dendrogram.  (C)  A  Venn  diagram  demonstrates  the  relationship  between  the  genes  differentially  expressed 
across  the  SCPs  and  a  previously  described  bone  metastasis  gene  set.  Of  1 ,267  differentially  expressed  genes,  50  of  the  127  bone  metastasis 
genes  (1 02  are  unique)  overlap.  (D)  A  Northern  blot  showing  the  expression  levels  of  4  of  the  bone  metastasis  genes  among  the  SCPs  used  in 
this  study  (boxed  and  labeled  by  SCP,  with  the  color  of  the  label  corresponding  to  tissue  tropism).  GAPDH,  loading  control. 


metastasize  to  the  lung.  Shortly  after  tail  vein  injection,  all  detect¬ 
able  cells  became  trapped  in  the  lung  (Figure  6A).  Within  the 
first  few  days,  there  was  a  substantial  attenuation  of  this  signal. 
In  SCP6  and  SCP26,  this  attenuation  continued  over  the  ensuing 
weeks,  suggesting  that  as  in  the  bone,  these  SCPs  were  unable  to 
efficiently  survive  and  grow  in  the  lung.  The  highly  bone-metastatic 
populations  SCP2  and  SCP46  were  also  unable  to  grow  in  the  lung 
but  were  able  to  survive  over  the  course  of  several  weeks,  as  shown 
by  their  persistent  bioluminescence  signal.  In  contrast,  SCP3  and 
SCP28,  and  to  a  lesser  extent  SCP32  and  SCP43,  were  able  to  grow 
in  the  lung.  To  confirm  the  presence  of  lung  metastases,  we  per¬ 


formed  histological  analysis.  Immunohistochemistry  with  CD31, 
which  is  a  marker  for  vascular  endothelial  cells,  revealed  multiple 
areas  of  perivascular  tumor  growth  and  growth  within  the  capil¬ 
lary-rich  lung  parenchyma  (Figure  6C). 

It  is  hypothesized  that  growth  at  metastatic  sites  is  enhanced  by 
genes  that  confer  productive  tumor-stroma  interaction.  Thus,  met¬ 
astatic  cells  that  grow  well  at  one  site  may  not  grow  well  at  another. 
Based  on  the  metastatic  tropisms  of  each  SCP  defined  by  BLI,  we 
ranked  SCPs  according  to  their  growth  kinetics  in  either  bone  or 
lung.  As  shown  in  Figures  4B  and  6B,  SCPs  with  a  higher  rank  order 
for  bone  were  color-coded  in  red,  and  those  with  a  higher  rank  order 
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Figure  8 

Segregation  of  primary  breast  carcinomas 
using  the  bone  metastasis  gene  signature. 
The  microarray  data  for  primary  breast 
tumors  from  patients  that  developed  distant 
metastasis  were  used  in  hierarchical  clus¬ 
tering  using  the  50  bone  metastasis  genes 
described  in  Figure  7C.  Both  the  patient 
samples  (columns)  and  the  genes  (rows) 
were  clustered.  The  patient  samples  were 
classified  into  two  major  clusters  with  an 
overall  R  index  of  0.90  (a  robustness  index; 
see  Methods).  The  site(s)  of  distant  recur¬ 
rence  for  each  patient  is  (are)  listed  along 
the  bottom,  with  the  site  of  first  recurrence 
listed  first.  The  genes  were  clustered  into 
six  groups  (labeled  along  the  left),  with  the 
gene  symbol  of  each  gene  shown  (on  the 
right).  The  asterisks  indicate  genes  with¬ 
out  symbols  (from  top  to  bottom,  Affyme- 
trix  probe  set  identifiers  21 1429_s_at  and 
21 1 796_s_at).  Orbit,  orbit  of  the  eye. 


for  lung  were  coded  in  green.  The  bottom  three  SCPs  for  both  bone 
and  lung  growth  were  classified  as  least  metastatic  and  were  coded 
in  blue.  Consistent  with  che  concept  of  metastatic  tissue  tropism, 
the  SCPs  that  were  the  best  at  growing  in  bone  were  not  the  best  at 
growing  in  lung,  and  the  most  lung-metastatic  SCPs  generally  were 
not  the  most  metastatic  to  bone  (Figures  4B  and  6B). 

In  summary,  extensive  analysis  of  the  metastatic  activity  of  che  vari¬ 
ous  SCPs  derived  from  the  same  cancer  cell  line  has  revealed  signifi¬ 
cant  variability  in  their  metastatic  activity.  This  variability  is  seen  in 
cell  survival,  organ-specific  colonization,  and  organ-specific  growth. 

Genome-wide  variation  correlates  with  organ-specific  metastatic  phenotype. 
Because  the  presence  of  the  poor-prognosis  signature  of  the  various 
SCPs  does  not  strongly  correlate  with  any  recognizable  aspects  of 
their  metastatic  activity,  this  supports  the  hypothesis  that  many 
characteristics  of  metastatic  activity  are  governed  by  a  different 
set(s)  of  genes.  To  test  this  idea,  we  first  analyzed  the  SCP  microarray 
data  to  estimate  the  amount  of  variation  in  the  expression  levels  of 
the  more  than  22,000  genes  represented  on  the  Affymetrix  U133A 
GeneChip.  After  filtering  out  genes  in  which  more  than  half  of 
the  SCPs  showed  less  than  1.5-fold  change  in  expression  level  and 
by  eliminating  genes  that  were  absent  in  all  of  the  datasets,  1,267 
differentially  expressed  genes  remained.  A  higher-stringency  filter 
that  required  a  minimum  twofold  change  in  expression  reduced 
this  list  further  to  286  genes.  Multidimensional  scaling  was  then 
used  to  determine  the  relatedness  of  the  different  SCPs  based  on 
the  1,267  broadly  differentially  expressed  genes.  SCP2,  SCP25,  and 
SCP46  formed  one  distinct  group  in  three-dimensional  space,  while 
SCP28,  SCP3,  SCP32,  and  SCP43  formed  another  (Figure  7A).  A 


third  group  was  formed  by  SCP6,  SCP2 1,  and  SCP26.  Although  dis¬ 
tinct,  these  three  groups  of  SCPs  were  significantly  closer  to  each 
other  than  they  were  to  MCF10A.  As  expected,  hierarchical  cluster¬ 
ing  also  revealed  similar  relationships  (Figure  7B).  Both  the  multi¬ 
dimensional  scaling  and  the  hierarchical  clustering  were  repeated 
with  the  more  stringent  286  gene  list  and  numerous  other  filtered 
lists  and  gave  similar  results  (data  not  shown).  Interestingly,  both  of 
these  unsupervised  methods  (i.e.,  methods  wherein  knowledge  of 
class  assignments  are  not  used  in  the  analysis)  defined  groups  that 
reflected  the  BLI-assigned  primary  metastatic  tropisms  of  the  SCPs, 
as  shown  by  the  color  coding.  The  group  formed  by  SCP28,  SCP43, 
SCP3,  and  SCP32  was  mainly  metastatic  to  the  lung  (green),  while 
the  group  formed  by  SCP46,  SCP2,  and  SCP25  exhibited  aggres¬ 
sive  metastatic  growth  in  the  bone  (red).  The  least  metastatic  of 
the  SCPs,  SCP6,  SCP21,  and  SCP26,  formed  the  third  group  (blue). 
Some  SCPs  showed  significant  multi-tropic  properties  (Figure  7B). 
Thus,  the  "unsupervised”  separation  of  the  SCPs  into  broad  groups 
that  correlate  with  primary  properties  of  their  metastatic  pheno¬ 
types  supports  the  notion  that  distinct  gene  expression  patterns  are 
responsible  for  the  variability  seen  in  their  metastatic  activities. 

To  validate  that  metastasis-specific  genes  were  among  the  1,267 
differentially  expressed  genes,  we  determined  how  many  of  the 
102  unique  genes  from  our  previously  described  (12)  and  inde¬ 
pendently  derived  bone  metastasis  gene  set  (represented  on  the 
U133A  GeneChip  by  127  probe  sets)  were  among  the  1,267  genes. 
As  seen  in  the  Venn  diagram  in  Figure  7C,  50  of  the  127  bone 
metastasis  genes  were  overlapping.  This  set  of  50  included  IL11, 
CTGF,  and  CXCR4,  three  genes  that  were  determined  to  specifi- 
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cally  cause  bone  metastasis  (12).  Accordingly,  Figure  7D  demon¬ 
strates  that  the  expression  of  these  genes  strictly  correlated  with 
bone-specific  growth  (Figure  7D). 

Segregation  of  primary  tumors  using  a  bone  metastasis  gene  expression 
signature.  The  existence  of  a  poor-prognosis  gene  expression  sig¬ 
nature  from  the  bulk  expression  data  of  primary  breast  cancers 
suggests  that  the  emergence  of  cells  that  express  metastasis  genes 
may  occur  early  during  tumorigenesis.  Therefore,  we  wanted  to 
determine  whether  the  bone  metastasis  genes  that  we  identified 
in  our  MDA-MB-231  model  system  in  the  mouse  (12)  could  be 
detectable  within  primary  breast  carcinomas.  To  this  end,  we  used 
the  50  bone  metastasis  genes  expressed  among  the  bone-metastat- 
ic  SCPs.  Hierarchical  clustering  of  all  63  primary  breast  tumors  in 
our  cohort  did  not  robustly  distinguish  those  tumors  that  gave 
rise  to  bone  metastasis  from  those  that  did  not  (data  not  shown). 
This  suggests  that  either  our  bone  metastasis  signature  carries 
little  predictive  value  or  our  genes  are  expressed  only  by  an  unde¬ 
tectable  subpopulation  of  tumor  cells. 

To  help  distinguish  between  these  two  possibilities,  we  restrict¬ 
ed  our  analysis  to  those  primary  tumors  that  gave  rise  to  distant 
metastasis  (mainly  to  bone  and/or  to  lung)  (Figure  8).  Under  these 
conditions,  the  50  bone  metastasis  genes  could  be  used  to  divide  the 
primary  breast  carcinoma  groups  into  two  major  clusters  with  an 
overall  reproducibility  index  (R  index)  of  0.90,  which  is  indicative  of 
the  robustness  of  this  cluster.  The  primary  breast  carcinomas  that 
gave  rise  to  bone  metastasis  were  predominantly  associated  with 
the  second  cluster.  In  contrast,  those  samples  that  produced  lung 
metastasis  were  mainly  grouped  together  by  the  first  cluster.  The  50 
bone  metastasis  genes  were  also  clustered  togecher  into  six  groups 
based  on  similarity  in  their  expression  pattern.  Gene  cluster  2  repre¬ 
sented  genes  that  were  generally  upregulated  in  the  primary  tumors 
that  developed  bone  metastasis.  Genes  in  this  cluster  included  CTGF 
and  IL11,  in  addition  to  other  genes  that  are  upregulated  in  the 
bone-metastatic  SCPs,  including  NAP1IL3,  DUSP1,  ADAMTS1,  and 
SOCS2  (Supplemental  Table  1).  Some  genes  that  are  upregulated 
in  the  bone-metastatic  SCPs,  such  as  MMP1,  are  not  selectively 
upregulated  in  the  breast  carcinoma  primary  tumors  that  develop 
bone  metastasis;  for  example,  MMP1  is  also  involved  in  lung  metas¬ 
tasis  (our  unpublished  observations).  The  failure  of  other  genes  to 
display  concordant  expression  patterns  in  the  SCPs  and  the  breast 
primary  tumors  may  be  because  they  are  not  biologically  relevant 
and/or  because  of  unknown  peculiarities  of  the  clinical  data  set  or 
the  MDA-MB-231  model  system.  Nonetheless,  these  data  suggest 
that  the  development  of  distant  sites  of  metastasis  in  breast  cancer 
patients  is  related  to  differences  in  the  gene  expression  pattern  that 
is  discernible  by  our  bone  metastasis  gene  expression  signature. 

Discussion 

In  this  study,  we  have  demonstrated  that  SCPs  from  a  metastatic 
parental  breast  cancer  population  carry  a  poor-prognosis  signa¬ 
ture.  This  signature  varied  little  from  SCP  to  SCP;  however,  the 
metastatic  activity  of  different  SCPs  varied  significantly.  With  the 
sensitivity  afforded  by  noninvasive  BLI  coupled  with  fluorescence 
microscopy,  we  were  able  to  fully  characterize  the  metastatic  activi¬ 
ties  of  individual  SCPs  by  evaluating  tissue  tropism  and  growth 
kinetics.  We  determined  that  some  SCPs  were  capable  of  efficient 
metastasis  to  bone,  others  metastasized  better  to  lung,  and  a 
minority  were  also  able  to  colonize  and  grow  within  the  adrenal 
gland  and/or  other  sites.  This  activity  resembles  the  typical  dis¬ 
tribution  of  breast  cancer  metastases  observed  in  patients.  Some 


SCPs  exhibit  multiple  tropisms,  while  others,  in  contrast,  are  only 
weakly  metastatic  and/or  give  rise  to  dormant  lesions.  The  pres¬ 
ence  of  cells  with  different  metastatic  properties  from  the  same 
pleural  effusion-derived  cell  line  may  reflect  an  accumulation  of 
circulating  tumor  cells  from  multiple  metastatic  sites  within  the 
pleural  fluid  of  the  patient  from  which  the  cells  were  derived. 

Although  we  cannot  rule  out  the  possibility  that  minor  varia¬ 
tions  in  the  poor-prognosis  signature  may  contribu  te  to  these 
differences  in  metastatic  phenotypes,  hierarchical  clustering 
based  on  the  poor-prognosis  genes  does  not  clearly  segregate  the 
SCPs  into  different  groups  that  correlate  with  particular  aspects 
of  metastatic  activity  such  as  colonization  and  growth  within 
specific  organs.  This  suggests  that  the  genes  that  make  up  the 
poor-prognosis  signature  do  not  control  these  more  specific 
metastatic  properties.  In  contrast,  hierarchical  clustering  based 
on  the  entire  gene  expression  data  set  does  segregate  the  SPCs 
into  different  groups  with  different  organ  tropisms.  The  poor- 
prognosis  signature  was  defined  in  a  way  that  does  not  take  into 
account  particular  characteristics  of  metastasis  such  as  tissue  tro¬ 
pism  and  growth  kinetics.  In  a  recent  report  comparing  human 
primary  breast  tumors  to  distant  metastases  from  various  organs, 
the  primary  tumor  showed  extensive  genetic  similarity  to  the  dis¬ 
tant  metastasis  from  the  same  patient,  and  a  “supervised”  method 
was  unable  to  generate  a  classifier  to  distinguish  primary  tumors 
from  metastases  (15).  These  results  are  in  line  with  the  concepts 
of  a  poor-prognosis  signature;  however,  because  the  metastasis 
samples  were  from  various  organs,  the  presence  of  site-specific 
metastasis  genes  could  not  be  determined.  Thus,  the  poor-prog¬ 
nosis  signature  may  be  composed  of  gene  expression  events 
acquired  early  during  primary  tumor  development  that  function 
to  endow  tumor  cells  with  baseline  metastatic  properties  or  that 
mark  a  particular  cell  phenotype  that  is  liable  to  express  meta¬ 
static  functions.  Indeed,  MBA-MD-231  cells  are  derived  from  the 
pleural  effusion  of  a  patient  with  widespread  metastatic  disease, 
and  all  of  the  individual  clones  from  this  population  that  we  ana¬ 
lyzed  show  at  least  some  level  of  metastatic  activity. 

Based  on  the  identification  of  metastasis  genes  associated  with 
osteolytic  bone  metastasis,  our  previous  study  proposed  that  in 
addition  to  the  poor-prognosis  signature,  metastatic  cells  need  to 
acquire  a  genetic  “tool  box,”  or  a  set  of  genes  that  confer  the  func¬ 
tions  necessary  for  efficient  tissue-specific  growth.  The  genes  that 
make  up  this  “tool  box”  would  be  regarded  as  metastasis-specific 
genes  that  are  acquired  through  mutation  or  epigenetic  chang¬ 
es.  However,  the  classification  of  genes  into  this  category  would 
require  a  level  of  specificity  such  as  tissue  tropism.  Our  current 
study  provides  support  for  this  requisite,  as  the  expression  of  these 
genes  strictly  correlated  with  efficient  bone  metastasis  and  not  with 
other  recognizable  aspects  of  metastatic  activity.  In  addition,  multi¬ 
dimensional  scaling  of  genes  that  are  differentially  expressed  across 
SCPs  defines  groups  that  correlate  with  primary  tissue  tropism, 
and  our  bone  metastasis  gene  set  overlaps  with  these  differentially 
expressed  genes.  We  expect  that  within  these  differentially  expressed 
genes,  a  lung  metastasis  gene  set  will  also  exist  (our  unpublished 
observations).  Thus,  SCPs  with  different  genetic  profiles  can  exhib¬ 
it  marked  differences  in  their  ability  to  colonize  and  to  grow  expo¬ 
nentially  in  various  metastatic  sites.  These  results  support  the  idea 
of  the  importance  of  productive  tumor-stroma  interactions  that 
foster  metastatic  growth,  consistent  with  Paget’s  “seed  and  soil” 
hypothesis  (16),  or  interactions  such  as  those  between  tumor  and 
vasculature  that  result  in  differential  tissue  arrest. 
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Some  of  the  SCPs  that  we  analyzed  demonstrated  the  ability  to 
grow  effectively  at  more  than  one  metastatic  site.  For  example, 
SCP28  grew  well  in  both  the  bone  and  the  lung,  SCP3  was  metastatic 
to  both  lung  and  adrenal,  and  SCP2  exhibited  both  bone  and  adrenal 
tropism.  In  contrast,  SCP46  was  metastatic  only  to  the  bone.  The 
multi-tropic  properties  of  metastatic  cells  raise  the  possibility  that 
metastatic  cells  from  one  site  may  spawn  metastasis  to  another  site. 
Because  there  are  limited  clinical  situations  in  which  single  metas¬ 
tasis  or  oligometastasis  is  effectively  treated  by  surgical  excision, 
knowledge  of  whether  metastatic  cells  are  single  or  multi-tropic  may 
be  of  important  clinical  relevance. 

A  metastatic  cell  must  complete  a  series  ofsequential  steps  in  order 
to  successfully  colonize  and  grow  at  a  distant  site.  Our  data  suggests 
that  the  expression  of  a  poor-prognosis  signature  can  mark  only  a 
baseline  ability  to  accomplish  some  of  these  steps.  The  signature  may 
comprise  genes  related  to  the  early  oncogenic  changes  that  drive  pri¬ 
mary  tumor  formation,  but  is  absent  in  genes  that  dictate  organ-spe¬ 
cific  metastatic  activity.  These  additional  metastasis  genes  provide 
the  capability  to  become  fully  metastatic  and  confer  properties  such 
as  organotropism.  It  is  unclear  whether  these  metastasis  genes  are 
acquired  during  the  growth  of  the  primary  tumor  or  during  coloni¬ 
zation  at  a  distant  site  (17).  Indeed,  our  hierarchical  clustering  of  a 
mixed  cohort  of  primary  breast  tumors  with  a  bone  metastasis  gene 
expression  signature  (12)  did  not  allow  robust  classification  of  those 
tumors  that  gave  rise  to  bone  metastasis  versus  those  that  did  not. 
Nonetheless,  this  signature  was  able  to  distinguish  between  primary 
breast  carcinomas  that  preferentially  metastasized  to  bone  from 
those  that  preferentially  metastasized  elsewhere.  These  results  sug¬ 
gest  that  the  development  of  distant  sites  of  metastasis  in  breast  can¬ 
cer  patients  is  related  to  differences  in  primary  tumor  gene  expression 
pattern  that  are  discernible  by  our  bone  metastasis  gene  expression 
signature.  A  further  enrichment  of  the  list  of  bone  metastasis  genes 
may  allow  in  the  future  accurate  prediction  of  the  bone  metastasis 
tropism  of  breast  cancer  primary  tumors. 

Methods 

Cell  culture  and  retroviral  gene  transfer.  MDA-MB-231  cells  were  obtained 
from  ATCC  and  were  cultured  in  Dulbecco’s  modified  Eagle’s,  high  glu¬ 
cose  supplemented  with  10%  FBS.  SCPs  were  derived  from  MDA-MB-231 
cells  as  described  previously  (12).  The  construction  and  retroviral  gene 
transfer  of  the  triple-modality  reporter  gene  TGL  has  been  described  pre¬ 
viously  (14).  In  brief,  20  pg  of  the  TGL  reporter  plasmid  SFG-ni:sTGL  was 
transfected  into  the  GPG29  packaging  cell  line  with  Lipofectamine  2000 
(Invitrogen).  Virus-containing  supernatants  were  harvested  between  72 
and  96  hours,  were  filtered  with  a  0.45-pm  syringe  filter,  and  were  used 
to  infect  MDA-MB-231  SCPs  for  12-24  hours  in  the  presence  of  8  pg/ml 
of  polybrene  (Sigma-Aldrich).  At  72  hours  after  infection,  successful  gene 
transfer  was  confirmed  by  visualization  of  GFP  by  fluorescence  microsco¬ 
py.  These  cells  were  enriched  by  fluorescence-activated  cell  sorting  (FACS- 
Vantage;  Becton  Dickinson).  Luciferase  activity  was  confirmed  in  vitro  by 
seeding  of  1  x  10s  cells  into  a  24-well  plate  followed  by  the  addition  of  0.03 
mg  of  D-Luciferin  (Xenogen).  Luciferase  activity  was  measured  with  the 
IVIS  Imaging  System  (Xenogen). 

Mouse  xenografting.  For  intracardiac  injections,  subconfluent  cells  were  har¬ 
vested,  washed  in  PBS,  and  resuspended  at  a  concentration  of  1  x  106cells/ml. 
BALB/c  nude  mice  (NCI)  were  anesthetized  by  intraperitoneal  injection 
of  ketamine  (100  mg/kg)  and  xylazine  (10  mg/kg)  and  were  placed  in  the 
supine  position.  With  a  26-gauge  needle,  1  x  10s  cells  were  injected  into 
the  left  ventricle  via  the  third  intercostal  space  after  visualization  of  arterial 
blood  flow  into  the  syringe.  For  tail  vein  injections,  unanesthetized  mice 


were  warmed  with  a  heat  lamp  to  allow  for  venous  dilation.  Mice  were  then 
placed  into  a  plastic  retraining  apparatus,  and  2  x  10s  cells  were  injected  via 
the  lateral  tail  vein.  Successful  injections  were  confirmed  by  immediate  BLI. 
All  animal  studies  were  performed  in  accordance  with  an  IACUC-approved 
protocol  at  the  Memorial  Sloan-Ketrering  Cancer  Center. 

BLI  and  analysis.  Anesthetized  mice  were  injected  retro-orbitally  with  75 
mg/kg  of  D-Luciferin  (Xenogen)  in  PBS.  Bioluminescence  images  were 
acquired  with  the  IVIS  Imaging  System  (Xenogen)  at  2-5  minutes  after 
injection.  Acquisition  times  at  the  beginning  of  the  rime  course  started  at 
60  seconds  and  were  reduced  in  accordance  with  signal  strength  to  avoid 
saturation.  Analysis  was  performed  using  Livinglmage  software  (Xenogen) 
by  measurement  of  photon  flux  (measured  in  photons/s/cm2/steradian) 
with  a  region  of  interest  (ROI)  drawn  around  the  bioluminescence  signal 
to  be  measured.  For  bone  metastasis,  an  ROI  was  drawn  around  the  major 
bioluminescence  signal  from  the  hindlimb,  forelimb,  or  pelvis/sacrum. 
For  lung  metastasis,  an  ROI  was  used  that  encompassed  the  thorax  of  the 
mouse.  For  determination  of  the  “fold  increase”  above  background,  aver¬ 
age  background  measurements  were  obtained  using  the  same  ROI  on  a 
corresponding  region  from  control  mice.  Data  were  divided  by  the  aver¬ 
age  background  measurement  and  were  normalized  to  the  signal  obtained 
immediately  after  xenografting  (day  0). 

Histology.  For  whole-mount  analysis,  sacrificed  mice  were  frozen  in  liquid 
nitrogen  and  were  stored  at  -80°C.  Prior  to  frozen  sectioning,  tissue  was 
embedded  in  Ml  embedding  media  (Shandon).  Sections  20  pm  in  thickness 
were  mounted  on  slides  and  were  fixed  with  100%  methanol  for  30  seconds. 
GFP  was  visualized  in  these  mounted  sections  using  a  fluorescence  micro¬ 
scope.  H&E  staining  was  then  performed  on  serial  sections  of  interest.  For 
immunohistochemistry  for  CD31,  lungs  were  fixed  in  4%  parafonnaldehyde 
overnight  and  were  incubated  in  30%  sucrose  for  an  additional  12-24  hours 
prior  to  cryosectioning.  CD31  staining  was  performed  with  the  Discover)’ 
AutoStainer  (Ventana  Medical  Systems)  and  anri-CD31  (sc-1506;  Santa 
Cruz  Biotechnolog)')  at  a  concentration  of  1  pg/ml. 

DNA  micorarray  analysis.  Methods  for  RNA  extraction,  labeling,  and 
hybridization  for  DNA  microarray  analysis  of  the  cell  lines  have  been 
described  previously  (12).  For  the  primary  breast  tumor  data,  tissues 
from  primary  breast  cancers  were  obtained  from  therapeutic  procedures 
performed  as  part  of  routine  clinical  management.  Samples  were  “snap- 
frozen”  in  liquid  nitrogen  and  were  stored  at  -80°C.  Each  sample  was 
examined  histologically  with  H&E-stained  cryostat  sections.  Regions  were 
manually  dissected  from  the  frozen  block  to  provide  a  consistent  tumor 
cell  content  of  more  than  70%  in  tissues  used  for  analysis.  All  studies  were 
conducted  under  protocols  approved  by  the  Memorial  Sloan-Kettering 
Cancer  Center  Institutional  Review’  Board.  RNA  was  extracted  from  fro¬ 
zen  tissues  by  homogenization  in  TRIzol  reagent  (GIBCO-BRL;  Invitrogen 
Corp.)  and  was  evaluated  for  integrity.  Complementary  DNA  was  synthe¬ 
sized  from  total  RNA  using  a  T7  promoter-tagged  dT  primer.  RNA  target 
w'as  synthesized  by  in  vitro  transcription  and  was  labeled  with  biotinylated 
nucleotides  (Enzo  Biochem).  Labeled  target  was  assessed  by  hybridization 
to  Test3  arrays  (Afiymetrix). 

All  gene  expression  analysis  was  carried  out  using  the  Affymetrix  U133A 
chip.  Analysis  of  the  poor-prognosis  signature  was  performed  using  Gene- 
Spring  6.1  (Silicon  Genetics)  with  a  list  of  genes  from  the  70  genes  com¬ 
prising  the  poor-prognosis  signature  that  are  present  on  the  U133A  chip. 
For  multidimensional  scaling  and  hierarchical  clustering,  Afiymetrix  data 
were  imported  into  BRBArray  Tools  3.1  (developed  by  Richard  Simon  and 
Amy  Peng  Lam;  http://linus.nci.nih.gov/BRB-ArrayTools.html).  Hierarchi¬ 
cal  clustering  was  performed  using  either  Euclidean  distance  or  Pearson 
correlation.  Cluster  reproducibility  was  reported  as  an  R  index  (18).  To 
obtain  a  list  of  genes  that  are  broadly  differentially  expressed  among  the 
SCPs,  we  applied  a  filter  to  the  22,238  genes;  this  filter  eliminated  genes  in 
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which  expression  levels  differed  by  at  least  either  1.5-fold  or  twofold  from 
the  mean  expression  level  in  less  than  half  of  the  data  sets.  An  additional 
filter  was  applied  to  eliminate  genes  with  an  absent  detection  call  in  all  of 
the  datasets.  The  final  filtered  Use  comprised  1,267  genes  (1.5-fold  filter) 
or  286  genes  (twofold  filter).  This  list  was  used  in  both  multidimensional 
scaling  and  hierarchical  clustering.  Other  filtering  criteria  were  also  tested 
and  gave  comparable  results. 

CXCR4  staining  for  flow  cytometry.  Subconfluent  cells  were  trypsinized 
and  were  washed  twice  in  cold  PBS.  Phycoerychrin-conjugated  anti¬ 
human  CXCR4  (BD  Pharmingen)  or  control  IgG  was  incubated  in  FACS 
buffer  (0.1%  sodium  azide  and  1%  bovine  serum  albumin  in  PBS)  for  1 
hour  at  4°C.  Cells  were  subsequently  washed  twice  in  PBS  and,  finally, 
were  resuspended  in  FACS  buffer.  Cells  were  analyzed  by  flow  cycometry 
using  a  BD  FACSCalibur  unit,  and  subsequent  data  analysis  was  done 
using  Flowjo  software. 
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When  breast  cancer  spreads  to  the  bone,  an  osteolytic  vicious  cycle  may  arise 
whereby  tumor  cells  instigate  local  osteoclasts  to  mobilize  bone-derived  TGFp 
that  further  activates  the  tumor1.  TGFp  can  signal  by  means  of  Smad  transcription 
factors  2,  which  are  quintessential  tumor  suppressors  that  inhibit  cell  proliferation 
3’4,  and  by  means  of  Smad-independent  mechanisms  which  are  implicated  in 
tumor  progression  5,e.  Although  Smad  mutations  disable  this  tumor  suppressive 
pathway  in  certain  cancers,  breast  cancer  cells  frequently  evade  the  cytostatic 
action  of  TGFp  while  retaining  Smad  function  3’4.  Here  we  show  that  breast  cancer 
cells  can  use  the  Smad  pathway  to  promote  bone  metastasis.  Functional  imaging 
and  immunohistochemical  analysis  reveal  the  presence  of  active  Smad  signaling 
in  mouse  and  human  bone  metastatic  lesions.  Smad  signaling  is  shown  to  be 
essential  for  the  induction  of  the  bone  metastasis  gene  interleukin-11  ( IL11 ),  and 
to  significantly  contribute  to  the  formation  of  osteolytic  bone  metastases.  API  is 
a  key  participant  in  Smad-dependent  transcriptional  activation  of  IL11  and  its 
overexpression  in  bone  metastatic  cells.  Our  findings  provide  direct  functional 
evidence  for  a  switch  of  the  Smad  pathway,  from  tumor-suppressor  to  pro¬ 
metastatic,  in  the  development  of  breast  cancer  bone  metastasis. 


TGFp  plays  a  crucial  role  as  a  growth-inhibitory  cytokine  in  many  tissues  3’4.  The 
cytostatic  effect  of  TGFp  is  mediated  by  a  serine/threonine  kinase  receptor  complex  that 
phosphorylates  Smad2  and  Smad3,  which  then  translocate  into  the  nucleus  and  bind 
Smad4  to  generate  transcriptional  regulatory  complexes  2.  SMAD4  (also  known  as 
Deleted  in  Pancreatic  Carcinoma  locus  4,  DPC4)  and,  to  a  lesser  extent,  SMAD2  suffer 
mutational  inactivation  in  a  proportion  of  pancreatic  cancers  and  colon  cancers  3'4. 
However,  tumor  cells  that  evade  this  anti-proliferative  control  by  other  mechanisms  may 
display  an  altered  sensitivity  to  TGFp  and  undergo  tumorigenic  progression  in  response 
to  this  cytokine  3’4.  Patients  whose  pancreatic  or  colon  tumors  express  TGFp  receptors 
fare  less  well  than  those  with  low  or  absent  TGFp  receptor  expression  in  the  tumor 7 .  In 
mouse  models  of  breast  cancer,  TGFp  signaling  promotes  lung  8,9  and  bone  metastasis 
10.  Although  the  tumorigenic  actions  of  TGFp  have  been  ascribed  to  Smad-independent 
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mechanisms  6,  we  investigated  whether  the  Smad  pathway  mediates  bone  metastasis 
in  breast  cancer. 

Receptor-mediated  phosphorylation  of  Smad2  at  the  C-terminus  and  accumulation  of 
phospho-Smad2  in  the  nucleus  are  typical  indicators  of  TGFp  stimulation  2.  To 
determine  whether  this  pathway  is  active  in  bone  metastasis,  metastatic  tissues  from 
breast  cancer  patients  were  subjected  to  immunohistochemistry  with  anti- 
phosphopeptide  antibodies  against  receptor-phosphorylated  Smad2.  Bone  metastasis 
tissues  from  16  breast  cancer  patients  were  obtained  from  therapeutic  procedures 
performed  as  part  of  routine  clinical  management  of  these  patients  at  our  institution. 
Twelve  of  these  samples  showed  prominent  anti-phospho-Smad2  staining  (Figure  la), 
and  this  staining  was  concentrated  in  the  nucleus  (Figure  Ib-e).  Nuclear  phospho- 
Smad2  staining  was  present  both  in  the  tumor  cells  and  cells  of  the  surrounding  stroma 
(e.g.  Figure  1b),  suggesting  that  the  entire  field  was  under  TGFp  stimulation  in  these 
lesions.  The  other  four  metastasis  samples  analyzed  showed  little  or  no  staining.  Thus, 
a  majority  of  breast  cancer  bone  metastases  exhibited  evidence  of  Smad  pathway 
activation. 

Prompted  by  these  results,  we  sought  evidence  for  Smad-dependent  transcriptional 
activity  in  bone  metastasis  by  functional  imaging  in  a  mouse  xenograft  model.  This 
model  is  based  on  the  MDA-MB-231  cell  line,  which  was  derived  from  the  pleural 
effusions  of  a  breast  cancer  patient  with  metastatic  disease  11 .  From  parental  MDA-MB- 
231  cells  we  isolated  various  sub-lines  with  distinct  organ-specific  metastatic  behavior 
12,13.  The  sub-line  SCP2  is  highly  metastatic  to  bone  via  arterial  circulation  whereas  sub¬ 
line  SCP3  is  highly  metastatic  to  the  adrenal  glands.  A  retroviral  reporter  vector  Cis- 
TGFpi-Smads-HSV1-tk/GFP  was  created  in  which  a  fusion  protein  containing  HSV1 
thymidine  kinase  (HSVI-tk)  and  green  fluorescent  protein  (GFP)  was  placed  under  the 
transcriptional  control  of  a  TGFp-responsive  promoter  element  (Figure  2a).  We  chose 
the  TGFp  responsive  element  (TpRE)  from  the  mouse  germline  Iga  promoter 14,15.  This 
TpRE  is  recognized  by  Smad2/3-Smad4  in  complex  with  RUNX  family  members  and 
responds  to  TGFp  in  many  different  cell  lines  14,15.  RUNX  activity  in  breast  cancer  cells 
is  implicated  in  osteolytic  bone  metastasis  16.  Cis-TGFpi-Smads-HSV1-tk/GFP  was 
transduced  into  SCP2  and  SCP3  cells  together  with  a  second  retroviral  vector  SFG- 
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tdRFP-cmvFLuc  expressing  red  fluorescent  protein  (tdRFP)  17  and  firefly  luciferase 
(Flue)  under  constitutive  promoters  (Figure  2a).  The  RFP-positive  cells  expressed  green 
fluorescence  in  response  to  TGFp,  demonstrating  responsiveness  of  the  HSVI-tk/GFP 
construct  (Figure  2b,  c).  When  inoculated  into  the  arterial  circulation  of  immunodeficient 
mice,  SCP2  cells  formed  aggressive  bone  metastases,  as  visualized  by  luciferase 
bioluminescence  imaging  (Figure  2d).  These  lesions  also  expressed  TK  activity,  as 
determined  by  micro-positron  emission  tomography  (micro-PET)  (Figure  2d).  SCP3 
cells  formed  small  bone  metastases  and  very  large  adrenal  metastases  (Figure  2e  top) 
12.  Interestingly,  while  the  small  bone  metastases  formed  by  SCP3  expressed  TK 
activity  in  the  live  animals,  the  large  adrenal  metastases  formed  by  the  same  cells  did 
not  (Figure  2e  top).  The  location  of  these  lesions  was  verified  by  ex  vivo 
bioluminescence  of  the  affected  organs  after  necropsy  (Figure  2e,  bottom).  These 
results  suggest  that  breast  cancer  cells  undergo  Smad-dependent  transcriptional 
activation  in  the  bone  microenvironment. 

We  recently  identified  a  set  of  genes  that  mediate  osteolytic  bone  metastasis  by  MDA- 
MB-231  cells  12 .  Among  these  genes,  IL11  was  of  interest  because  of  its  role  as  an 
enhancer  of  osteoclast  differentiation  18  and  as  a  mediator  of  osteolysis  in  breast  cancer 
bone  metastasis  19,2°.  Enforced  expression  of  IL11  in  MDA-MB-231  cells  increases  their 
bone  metastatic  activity  12.  Intriguingly,  IL1 1  is  a  TGFp  inducible  gene  12,2\  providing  a 
mechanism  for  the  pro-metastatic  activity  of  TGFp  in  breast  cancer.  MDA-MB-231  cells 
are  defective  in  TGFp  cytostatic  gene  responses,  including  repression  of  c-myc  and  Id 
genes  22,  but  retain  many  responses  that  are  common  among  normal  epithelial  cells  23, 
including  IL11  induction  (Figure  3a  and  Supplemental  Table  I).  A  comparison  of  the 
basal  expression  of  TGFp  responsive  genes  in  various  MDA-MB-231  derivatives 
revealed  a  sharp  (>9-fold)  and  selective  increase  in  the  basal  expression  of  IL11  in 
highly  bone-metastatic  sub-lines  compared  to  the  poorly  metastatic  sub-lines,  and 
compared  also  to  all  the  other  TGFp  responsive  genes  (Supplementary  Table  2; 
summarized  in  Figure  3a).  A  smaller  increase  was  observed  in  the  basal  expression  of 
CTGF,  which  is  another  TGFp  responsive  gene  implicated  in  bone  metastasis  12 
(Supplementary  Table  2).  Thus  the  bone  metastatic  cells  overexpressed  certain  TGFp 


Kang  et  al  p.  5 


responsive  genes  that,  in  the  context  of  the  bone  marrow  microenvironment,  stimulate 
osteolytic  metastasis. 

Several  results  suggested  that  IL11  is  an  immediate  TGFp  target  gene.  IL11  induction 
by  TGFp  is  rapid,  peaking  at  2h  and  gradually  declining  thereafter  (Figure  3b),  and  the 
protein  synthesis  inhibitor  cycloheximide  does  not  block  this  response  (data  not  shown). 
TGFp  stimulation  induces  the  binding  of  Smad2/3  and  Smad4  to  the  IL11  promoter  in 
chromatin  immunoprecipitation  experiments  12.  To  determine  whether  the  Smad 
pathway  is  required  for  IL1 1  induction  and  bone  metastasis,  we  analyzed  MDA-MB-231 
single  cell  progeny  (SCP)  sub-lines  that  were  depleted  of  Smad4  by  means  of  RNAi. 
Compared  to  parental  cells  or  in  vivo  selected  bone-metastatic  populations,  which  are 
heterogeneous,  SCPs  are  derived  from  single  cells  and,  therefore,  are  more 
homogenous  in  genetic  makeup  12,24.  Three  bone  metastatic  sub-lines,  SCP2,  SCP25 
and  SCP28,  were  engineered  to  stably  express  the  short-hairpin  RNA  (shRNA)  probes 
Smad4-shRNA1  or  Smad4-shRNA2,  which  target  different  regions  of  the  Smad4 
mRNA.  Expression  of  Smad4-shRNA1  reduced  Smad4  protein  levels  by  70-90%  in  all 
three  SCPs  whereas  Smad4-shRNA2  almost  completely  eliminated  Smad4  production 
(Figure  3c).  As  a  control,  we  engineered  a  Smad4  vector  (pBabe-hygro-Flag-Smad4M) 
containing  two  silent  mutations  in  the  sequence  targeted  by  Smad4-shRNA1  and  an  N- 
terminal  flag  epitope  distinguishing  the  exogenous  product  from  endogenous  Smad4. 
Transduction  of  this  retrovirus  ensured  expression  of  Smad4  in  cells  containing  Smad4- 
shRNAI  (Figure  3c).  As  determined  by  Northern  blot  analysis,  the  IL11  response  to 
TGFp  was  very  weak  in  cells  expressing  Smad4-shRNA1  and  undetectable  in  cells 
expressing  Smad4-shRNA2  (Figure  3d  for  SCP25;  data  not  shown  for  SCP2  and 
SCP28).  Expression  of  Smad4M  restored  the  TGFp  response  in  Smad4-shRNA1 
expressing  cells.  A  similar  response  pattern  was  observed  at  the  level  of  IL11  protein 
secretion,  as  determined  by  ELISA  (Figure  3e;  and  data  not  shown  for  SCP2  and 
SCP28).  Thus,  Smad4  is  essential  for  TGFp  activation  of  IL1 1  expression. 

To  further  investigate  the  role  of  Smad  factors  in  the  IL11  response  to  TGFp,  we 
focused  on  a  lOObp  region  immediately  upstream  of  the  TATA  box  in  the  IL1 1  promoter. 
This  region  mediates  the  TGFp  response  of  the  IL1 1  promoter  in  human  epithelial  and 
carcinoma  cells  21,25.  A  reporter  construct  under  the  control  of  the  minimal  IL1 1  promoter 
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[plL11-(100)-Luc]  21  was  unresponsive  to  TGFp  in  the  Smad4-deficient  breast  cancer 
cel!  line  MDA-MB-468  26  (Figure  4a).  Expression  of  exogenous  Smad4  enabled  TGFp 
induction  of  this  promoter,  and  this  effect  was  further  enhanced  by  co-transfection  of 
Smad2  or  Smad3  (Figure  4a),  arguing  that  Smads  mediate  transcriptional  activation 
from  this  promoter  region. 

This  lOObp  region  includes  two  API  binding  sites,  which  are  critical  for  IL11 
transcription  21 ,25,  and  an  adjacent  GC-rich  (92%  GC)  sequence  with  two  putative  SP1 
sites  (Figure  4b).  No  canonical  Smad  binding  element  (AGAC  sequence)  is  present  in 
this  region.  However,  Smads  can  bind  to  GC-rich  sequences  in  certain  promoters  2. 
Deletion  analysis  of  the  IL11  promoter  region  by  means  of  a  reporter  construct  indicated 
that  the  response  to  TGFp  minimally  requires  the  5’  API  site  and  an  adjacent  GC-rich 
sequence  (Figure  4b).  In  electrophoretic  mobility  shift  assays,  recombinant  Smad4 
bound  to  the  wild  type  minimal  IL11  promoter  probe,  resulting  in  the  formation  of  a 
complex  that  could  be  shifted  by  addition  of  anti-Smad4  monoclonal  antibody  (Figure 
4c).  Mutation  or  deletion  of  the  API  sites  decreased  but  did  not  abolish  Smad4  binding 
to  the  probe,  whereas  the  API  sites  alone  did  not  bind  Smad4  (Figure  4c).  The  binding 
of  endogenous  Smad  and  API  factors  to  this  region  was  assessed  by  means  of 
oligonucleotide  precipitation  assays.  MDA-MB-231  cells  were  incubated  with  or  without 
TGFp  for  2h,  lysed,  and  precipitated  with  biotinylated  double-stranded  DNA  probes. 
Immunoblotting  of  DNA-bound  factors  demonstrated  TGFp-dependent  binding  of 
endogenous  Smad3  and  Smad4  to  the  wild  type  IL11  minimal  promoter  region,  and 
TGFp-independent  binding  of  the  endogenous  API  component  JunB  to  this  region 
(Figure  4d).  Deletion  or  mutation  of  the  API  sites  eliminated  binding  of  JunB  and 
weakened  Smad  binding. 

Consistent  with  a  role  of  API  in  the  IL11  response  to  TGFp  in  the  breast  cancer  cells, 
the  API  activator  12-O-tetradecanoylphorbol-l  3-acetate  (TPA)  27  increased  the  basal 
level  of  IL11  expression  as  well  as  the  level  upon  TGFp  stimulation,  whereas  the  API 
inhibitor  curcumin  27  abolished  the  activation  of  IL11  by  TGFp  (Figure  4e).  As 
determined  using  an  API  reporter  construct  (4xAP1  -luciferase),  the  basal  level  of  API 
activity  was  significantly  higher  in  the  highly  metastatic  sub-lines  SCP2,  SCP25,  SCP28 
and  1833  than  in  poorly  metastatic  sub-lines  SCP4  and  SCP6  or  parental  MDA-MB-231 
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cells  (Figure  4f).  The  level  of  API  activity  in  these  cell  populations  was  closely 
correlated  with  the  basal  level  of  IL11  expression  (Figure  4f;  refer  to  Supplementary 
Table  2).  No  change  in  4xAP1  luciferase  activity  was  observed  after  4  h  of  TGFp 
treatment  (data  not  shown).  Collectively,  these  results  suggest  that  TGFp-activated 
Smad  proteins  bind  to  the  GC-rich  region  in  the  proximal  IL11  promoter.  This  binding  is 
strengthened  by  the  presence  of  a  proximal  API  site,  and  transcriptional  activation 
results  from  a  cooperation  between  Smad3  and  API .  These  observations  also  indicate 
a  role  of  API  in  the  hyperactivity  of  IL11  in  bone  metastatic  MDA-MB-231  cells. 

Having  shown  that  the  TGFp  response  of  a  bone  metastasis  gene  in  these  cells 
required  Smad  function,  we  tested  the  contribution  of  Smad  signaling  to  the  metastatic 
process  itself.  Wild-type,  Smad4-knockdown,  and  Flag-Smad4M  versions  of  the  various 
SCPs  were  infected  with  a  retroviral  vector  expressing  HVSI-tk/GFP/luciferase  triple 
fusion  protein  28.  The  cells  were  inoculated  into  the  left  cardiac  ventricle  of 
immunodeficient  mice  to  allow  the  formation  of  bone  metastasis.  As  determined  by 
bioluminescence  imaging  of  luciferase  activity,  the  inoculated  cells  became  immediately 
distributed  throughout  the  entire  animal  followed  by  extensive  clearing  within  one  week 
(Figure  5a).  Accumulation  of  luciferase  signal  was  clear  14  days  after  injection  and 
became  more  intense  over  the  following  weeks.  To  quantify  the  rate  of  metastatic 
growth  in  bone,  a  region  of  interest  (ROI)  was  drawn  around  the  bone  metastases 
signals  near  the  joint  of  the  affected  hind  limbs,  and  the  normalized  photon  counts  of 
each  metastasis  was  plotted  (Figure  5b).  A  linear  correlation  between  the  intensity  of 
the  bioluminescence  and  tumor  burden  is  obtained  using  this  method  29.  Suppression  of 
Smad4  activity  by  two  different  shRNA  constructs  caused  a  significant  reduction  in  the 
growth  rate  of  bone  metastatic  lesions  (Figure  5a,  b).  Restoration  of  Smad4  function  by 
the  shRNA-insensitive  Smad4M  construct  restored  the  wild-type  rate  of  metastatic 
growth  (Figure  5a,  b).  These  results  were  consistently  observed  in  all  three  SCPs 
tested  (Figure  3c,  and  data  not  shown  for  SCP2  and  SCP28).  Formation  of  overt 
osteolytic  bone  metastases  was  monitored  by  weekly  full-body  x-ray  imaging  of  the 
mice.  Smad4  depletion  consistently  reduced  the  rate  of  bone  metastasis  formation  in  all 
three  MDA-MB-231  SCPs  and  in  the  in  vivo-selected  bone-metastatic  population  1833 
12  (Figure  5c).  A  significant  level  of  metastatic  activity  still  remained  after  Smad4 
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depletion,  which  is  consistent  with  the  TGFp-independent  involvement  of  several  genes 
( MMP1 ,  CXCR4,  Osteopontin  and  others)  in  these  lesions  12 .  Smad4-knockdown  did  not 
decrease  the  growth  rate  of  the  SCPs  or  1833  cells  in  culture  (data  not  shown)  or  their 
ability  to  form  subcutaneous  tumors  in  mice  (Figure  5d),  arguing  that  the  Smad4- 
dependent  growth  of  these  tumors  is  specifically  stimulated  by  the  bone 
microenvironment. 

In  sum,  our  results  show  that  the  Smad  tumor  suppressor  pathway  may  become  pro¬ 
metastatic  in  breast  cancer.  The  intrinsic  genomic  instability  of  tumor  cell  populations 
allows  for  the  selection  of  functions  that  favor  growth  in  a  given  environment.  Thus,  a 
bone  metastatic  lesion  will  harbor  functions  that  the  bone  environment  selects  for.  We 
speculate  that  pro-metastatic  Smad-mediated  gene  responses  can  emerge  once  this 
pathway  becomes  uncoupled  from  tumor-suppressor  effects.  If  at  that  point  a  Smad 
pathway  can  provide  metastatic  functions  to  cancer  cells,  it  likely  will  be  selected  as  a 
pro-metastatic  force.  Smad-responsive  genes  like  IL11  and  others  can  provide  an 
advantage  to  cancer  cells  in  a  TGFp-rich  bone  microenvironment.  Therefore,  an 
increase  in  the  basal  expression  of  these  genes  coupled  with  their  further  induction  by 
bone-derived  TGFp  would  favor  tumor  growth  in  the  bone.  Our  results  are  fully 
consistent  with  this  possibility.  By  implicating  the  Smad  pathway  in  the  osteolytic  vicious 
cycle  of  breast  cancer  metastasis  \  our  results  additionally  call  attention  to  the 
possibility  of  therapeutically  targeting  this  pathway 6,30  in  TGFp-rich  metastatic  sites. 


Kang  et  al  p.  9 


Experimental  procedures 
Tumor  sample  analysis. 

Formalin-fixed  paraffin  embedded  (FFPE)  bone  metastasis  tissues  were  obtained  from 
therapeutic  procedures  performed  as  part  of  routine  clinical  management  of  breast 
cancer  patients  at  our  institution.  Hematoxylin  and  eosin  stained  sections  were 
examined  for  regions  that  contained  both  tumor  cells  and  stroma,  which  were  further 
analyzed  for  phosphorylated  Smad2  on  serial  sections.  All  studies  were  conducted 
under  MSKCC  Institutional  Review  Board  approved  protocols. 

Immunohistochemistry 

Immunohistochemical  analysis  was  performed  with  a  Discovery  XT  System  (Ventana 
Medical  Systems)  using  tissue  sections  blocked  for  30  minutes  in  10%  normal  goat 
serum  (Vector  Laboratories;  catalog#  S-1000)  and  2%  BSA.  Incubation  with  anti- 
phospho-Smad2  (Ser465/467)  primary  antibody  (Cell  Signaling;  catalog#3101;  dilution 
1 :500)  was  carried  out  for  3  h  at  room  temperature  followed  by  a  1  h  incubation  with 
biotinylated  anti-rabbit  secondary  antibody  at  1:200  dilution  (Vectastain  ABC  Kit  Rabbit 
IgG  catalog#  PK-6101)  and  DAB  detection  kit  (Ventana  Medical  Systems)  according  to 
the  manufacturer  instructions. 

TGFpi-Smads-HSV1-tk/GFP  reporter  system 

Double-stranded  complementary  oligonucleotides,  containing  a  sequence  from  the 
mouse  germline  Iga  promoter  5'-AATTCGGCCATGTGGTCAGACACACCTGTCT 
CCACCACAGCCAGACCACAGGCCAGACAT G ACGT GGAGGTT -3  31 ,  were  used  to 
construct  the  TGFpi-Smads-HSV1-tk/GFP  reporter  vector.  After  annealing  of 
oligonucleotides,  the  resulting  DNA  fragment  was  cloned  into  the  EcoRI  and  Xba\  sites 
of  the  dxNFAT-tk/GFP-Neo  vector 32  in  place  of  the  NFAT  enhancer  element.  Thus,  the 
Herpes  Simplex  Virus  1  thymidine  kinase-eGFP  (HSVI-tk/GFP)  fusion  reporter  gene 
was  linked  to  the  enhancer  elements  specific  for  Smad-AML  transcriptional  complexes. 
The  resulting  plasmid  was  transfected  into  the  GPG29  packaging  cell  line  with 
Lipofectamine2000  (Invitrogen,  Carlsbad,  CA).  The  retrovirus-containing  medium  was 
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collected  for  4  consecutive  days  and  stored  at  -80°C.  The  retrovirus  was  then  used  to 
transduce  MDA-MB-231  cells  and  their  sub-line  SCP3  12,24.  Selection  of  stable 
transfectants  was  accomplished  by  adding  1  g/L  of  G418.  Cells  containing  the  TGFpi- 
Smads-HSVI-tk/GFP  reporter  system  were  further  transduced  with  a  second  retroviral 
vector  SFG-tdRFP-cmvFLuc,  in  which  tdRFP  17  and  firefly  luciferase  encoding  cDNAs 
were  placed  under  constitutive  promoters.  RFP  positive  cells  were  sorted  by  FACS. 

The  retrovirus  vector  encoding  a  TK-eGFP-Luciferase  triple  fusion  proteins  has  been 
previously  described  28. 

Transcriptomic  profiling  and  clustering  analyses 

Tissue  collection,  RNA  sample  collection  and  generation  of  biotinylated  complementary 
RNA  (cRNA)  probe  were  carried  out  essentially  as  described  in  the  standard  Affymetrix 
(Santa  Clara,  CA)  GeneChip  protocol.  Each  sample  was  hybridized  with  an  Affymetrix 
Human  Genome  U133A  microarray  for  16h  at  45°C.  Absolute  analysis  of  each  chip  and 
comparative  analysis  of  TGFp  treated  samples  with  the  untreated  samples  were  carried 
out  using  the  Affymetrix  Microarray  Suite  5.0  Software.  Genes  whose  expression  level 
was  changed  by  more  than  two  fold  with  p  <0.001  were  scored  as  TGFp  regulated 
genes.  Dendrogram  illustration  of  TGFp  gene  responses  in  MDA-MB-231  and  MCF- 
10A  cell  lines  were  produced  using  GeneSpring  (Silicon  Genetics,  CA)  software. 

Cell  culture  and  retroviral  transduction 

Parental  (ATCC)  MDA-MB-231  cell  line  and  its  various  sublines,  as  well  as  A549  cell 
line  were  maintained  in  DMEM  medium  supplemented  with  10%  fetal  bovine  serum 
(FBS),  penicillin,  streptomycin  and  fungizone.  Phoenix  cells,  a  helper  cell  line  for 
retrovirus  production,  were  maintained  in  DMEM  medium  supplemented  with  10%  fetal 
bovine  serum  (FBS),  1%  glutamine  and  antibiotics. 

Retroviruses  expressing  Smad4  shRNA,  FLAG-Smad4M,  or  imaging  proteins,  were 
produced  from  amphotropic  Phoenix  packaging  cell  line.  Phoenix  cell  transfections  were 
performed  using  LipofectAMINE  (Invitrogen),  according  to  the  manufacturer’s 
instructions.  Viruses  were  harvested  48h  and  72h  after  transfection,  filtered,  and  used  to 
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infect  MDA-MB-231  cell  cultures  in  the  presence  of  5pg/ml  of  polybrene.  Infected  cells 
were  selected  by  fluorescence  activated  cell  sorting  (FACS)  for  GFP  positive  cells,  or  by 
selection  for  puromycin  or  hygromycin  resistance.  To  avoid  clonal  variations,  we  pooled 
at  least  2000  individual  transfectants  for  each  stable  cell  line  produced  by  transduction. 

Plasmids 

The  minimal  IL1 1  promoter  region  containing  the  TATA  box  (-31  to  +52)  was  cloned  as 
a  Kpnl/Bgll  fragment  into  the  corresponding  sites  in  the  pXP2-luc  (ATCC)  to  create 
pILII-TATA-Luc.  Various  IL11  promoter  regions  (Figure  1C)  immediately  upstream  of 
the  TATA  box  were  then  inserted  as  BamHI/Kpnl  fragments  upstream  of  the  TATA  box 
to  generate  a  series  of  luciferase  reporters  controlled  by  different  regions  of  the  IL1 1 
promoter.  Retroviral  vectors  that  encode  shRNAs  against  hSmad4  transcript  were 
generated  by  cloning  suitable  oligonucleotide  sequences  into  the  pSUPER-retro-puro 
vector  33.  The  coding  strand  of  the  Smad4  targeting  shRNAs  were 
GGAT  GAATAT  GT  GCAT  GAC  (Smad4-shRNA1 )  and  GGT  GTGCAGTT  GGAAT  GT A 
(Smad4  shRNA2).  A  cDNA  sequence  encoding  FLAG  epitope-tagged  hSmad4  was 
cloned  to  the  BamHI/Sall  sites  of  pBabe-hygro  34  to  generate  pBabe-hygro-Flag-Smad4. 
Silent  mutations  were  generated  by  site-directed  mutagenesis  in  the  coding  sequence 
of  Tyr162  (from  TAT  to  TAC)  and  Vail 63  (from  GTG  to  GTT)  to  create  a  shRNA- 
insensitive  version  of  Smad4  expression  plasmid  pBabe-hygro-Flag-Smad4M. 

Luciferase  reporter  assays 

Luciferase  reporter  assays  were  performed  as  previously  described  22.  100  pM  TGFpi 
(R&D  Systems),  10  pg/ml  of  cycloheximide  (Sigma),  lOOnM  TPA  (Sigma),  70pM 
Curcumim  (Sigma)  were  used  to  treat  cells  in  various  assays.  Northern  blot  analysis 
was  carried  out  as  previously  described  22. 
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ELISA  analysis 

The  production  and  secretion  of  IL11  in  various  sublines  of  MDA-MB-231  were 
determined  in  24h-conditioned  media  using  commercially  available  IL1 1  (R&D  Systems) 
ELISA  kits  according  to  the  manufacturer’s  instructions. 

Electrophoretic  mobility  shift  assay 

Purified  full-length  Smad4  protein  was  used  in  this  experiment.  Complementary 
oligonucleotides  corresponding  to  the  wild-type  IL11  promoter  and  its  mutants  were 
annealed  and  end  labeled  with  y32P-ATP.  The  sequences  for  the  probes  are:  5’- 
GGGT  GAGT  CAGGAT  GT  GT  CAGGCCGGCCCTCCCCT  GCCGCCTGCCCCCCGCCCG 
CCCGCCCCAGGCCCC-3’  for  W.T.,  5'-GGGTTCTTCAGGATTGTTCAGGCCGGCCC 
T CCCCT GCCGCCT GCCCCCCGCCCGCCCGCCCCAGGCCCC-3’  for  mAPI ;  5’-GGC 
CGGCCCTCCCCTGCCGCCTGCCCCCCGCCCGCCCGCCCCAGGCCCC-3’  for  GC; 
5’-GGGT  GAGT  CAGGAT  GT  GT  CA-3’  for  API  and  5’-GTAAGCCCGGCCAGCCGACC 
GGGGC3’  for  p-actin. 

The  DNA-protein  binding  reactions  were  performed  and  analyzed  on  a  5% 
nondenaturing  gel  (Brunet  et  al.,1999).  For  supershift  assessment,  lul  of  mouse 
monoclonal  antibody  (BD  transduction  Laboratories;  Catalog  #  610843)  against  Smad4 
was  preincubated  with  full-length  recombinant  His-Smad4  for  5  minutes  on  ice.  DNA- 
protein  complexes  were  visualized  by  autoradiography. 

DNA  precipitation  assay 

DNA  precipitation  assays  were  carried  out  as  described  previously 35.  The  sequences  of 
oligonucleotides  used  are  as  follows:  5’-GGGTGAGTCAGGATG 

TGTCAGGCCGGCCCTCCCCTGCCGCCTGCCCCCCGCCCGCCCGCCCCA-3’  for 
WT,  5’GGGACAATCCGGACAATCCGGCCGGCCCTCCCCTGCC  GCCTGCCCCCC 
GCCCGCCCGCCCCA-3’  for  mAPI,  5’-GGCCGGCCCTCCCCTGCCGCCTGCCCCC 
CGCCCGCCCGCCCCAGGCCCCC-3’  for  GC.  5’-GGGT  GAGT  CAGGAT  GT  GT  CAGGC 
CGGCCCTCCCCTGCCGCC-3’  for  AP1GC5’,  and  5’-GGGT  GAGT  CAGGAT  GT 
GTCATGCCCCCCGCCCGCCCGCCCCA-3’  for  AP1GC3’.  Nucleotides  that  were 
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mutated  in  the  corresponding  mutant  probes  were  highlighted  with  bolded,  underlined 
letters.  The  sequence  for  TIE  has  been  reported  previously 35. 

Intracardiac  injections 

Cell  were  harvested  from  subconfluent  cell  culture  plates,  washed  with  PBS,  and 
resuspended  at  106/ml  concentrated  in  PBS.  0.1ml  of  the  suspended  cells  were 
injected  into  the  left  cardiac  ventricle  of  4  week  old,  female  BALB/c-nu/nu  nude  mice 
(NCI)  using  26  gauge  needles  as  previously  described  10.  Mice  were  anesthetized  with 
ketamine  (lOOmg/kg  body  weight)  and  xylazine  (lOmg/kg  body  weight)  before  injection. 
A  successful  injection  was  characterized  by  the  pumping  of  arterial  blood  into  the 
syringe  and  by  immediate  bioluminescence  imaging. 

Bioluminescence  imaging  and  analysis 

Anesthetized  mice  were  retro-orbitally  injected  with  75  mg/kg  of  D-Luciferin  (Xenogen) 
in  PBS.  Bioluminescence  images  were  acquired  using  the  IVIS  Imaging  System 
(Xenogen)  at  2-5  minutes  post-injection.  Acquisition  times  at  the  beginning  of  the  time 
course  started  at  60  seconds  and  were  reduced  in  accord  with  signal  strength  to  avoid 
saturation.  Analysis  was  performed  using  Livinglmage  software  (Xenogen)  by 
measuring  photon  flux  (measured  in  photons/sec/cm2/steradian)  using  a  region  of 
interest  (ROI)  drawn  around  the  bioluminescence  signal  to  be  measured.  Images  were 
set  at  the  indicated  pseudo-color  scale  to  show  relative  bioluminescent  changes  over 
time.  Data  were  normalized  to  the  signal  obtained  right  after  xenografting  (day  0). 

Micro-PET  imaging 

Micro-PET  imaging  was  performed  using  18F-2'-fluoro-2'deoxy-1beta-D- 
arabionofuranosyl-5-ethyl-uracil  ([18F]FEAU)  as  the  HSV1-TK  substrate,  as  previously 
described  36.  Two  hours  before  whole  body  positron  emission  tomography  (PET),  the 
mice  were  administered  [18F]FEAU  (i.v.  100  pCi/animal).  Imaging  was  performed  on  a 
microPET  (Concorde  Microsystems,  Knoxville,  TN)  and  images  were  acquired  over  15 
minutes  under  inhalation  anesthesia  (Isoflurane  2%). 
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Radiographic  analysis  of  bone  metastasis 

Development  of  bone  metastases  was  monitored  by  X-ray  radiography.  Mice  were 
anesthetized,  arranged  in  prone  position  on  single-wrapped  films  (X-OMAT  AR, 
Eastman  Kodak,  Rochester,  NY),  and  exposed  to  an  X-ray  at  35kV  for  15  seconds 
using  a  Faxitron  instrument  (Model  MX-20;  Faxitron  Corp.  Buffalo,  IL,  USA).  Films  were 
developed  using  a  Konica  SRX-101 A  processor  and  inspected  for  visible  bone  lesions. 
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Figure  Legends 

Figure  1.  Activated  Smad  pathway  in  breast  cancer  bone  metastasis,  a,  Summary 
of  phospho-Smad2  immunoreactivity  in  tumor  cells  and  stromal  cells  in  16  samples  of 
human  breast  cancer  bone  metastases;  +,  ++,  +++  indicate  none,  weak,  moderate 
and  intense  immunoreactivity  respectively,  b-d,  Examples  of  intense 
immunohistochemical  staining  of  receptor-phosphorylated  Smad2  in  breast  cancer  bone 
metastasis  samples  from  different  patients.  The  samples  shown  were  chosen  to 
illustrate  the  nuclear  phospho-Smad2  staining  in  a  metastatic  island  and  the 
surrounding  stroma  (b),  in  a  cluster  of  metastatic  islands  (c),  or  in  a  contiguous 
metastatic  mass  (d),  as  well  as  a  cluster  of  islands  stained  using  normal  rabbit  serum  as 
a  negative  control. 

Figure  2.  Functional  imaging  of  Smad  signaling  in  breast  cancer  bone  metastasis 

a,  Schematic  representation  of  the  retroviral  vectors  SFG-tdRFP-cmvFLuc, 
constitutively  expressing  tdRFP  and  firefly  luciferase;  and  Cis-TGFDl-Smads-HSVI- 
tk/GFP,  expressing  HSV-tk/GFP  fusion  protein  in  response  to  TGFp.  b  and  c,  SCP3 
transduced  with  these  two  vectors  were  treated  with  TGFp  or  no  additions  for  24  h  and 
analyzed  by  fluorescence  microscopy  (b)  or  two-color  FACS  (c).  The  constitutive  tdRFP 
fluorescence  is  shown  on  the  ordinate,  and  the  HSV-TK/GFP  fusion  fluorescence, 
inducible  by  TGFp,  is  shown  on  the  abscissa,  d  and  e,  In  vivo  bioluminescence  and 
microPET  imaging  of  metastases  in  mice.  SCP2  (d)  and  SCP3  cells  (e),  bearing  the 
SFG-tdRFP-cmvFLuc  and  Cis-TGFpi-Smads-HSV1-tk/GFP  vectors,  were  injected  into 
the  left  cardiac  ventricle  and  analyzed  after  4  weeks  (SCP2)  or  18  weeks  (SCP3). 
Bioluminescence  imaging  shows  sites  of  metastases  in  the  skull  (in  d,  e)  and  adrenal 
gland  (in  e).  [18F]FEAU  micro-PET  images  of  tk/GFP  reporter  activation  shows 
localization  of  radioactivity  to  the  skull  in  both  coronal  and  sagittal  image  planes.  No 
visualization  of  the  adrenal  metastasis  was  seen  on  microPET  imaging.  Note  non¬ 
specific  accumulation  of  the  tracer  in  the  gastointestinal  tract  and  bladder  attributable  to 
clearance  of  the  tracer.  At  necroscopy,  the  head  showing  the  skull  and  the  adrenal 
metastasis  plus  kidney  were  removed  and  imaged  ex  vivo  for  photographic  (-)  and 
bioluminescence  (+)  imaging  (e,  lower  panel). 
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Figure  3.  Smad4-dependent  transcriptional  activation  of  IL11  by  TGFp.  a,  Basal 
expression  levels  of  50  TGFp-activated  genes  and  21  TGFp-repressed  genes  in  MCF- 
10A  and  MDA-MB-231  cells  were  normalized  to  the  same  level.  Responses  of  these 
genes  to  TGFp  in  each  cell  line  were  represented  by  different  shades  of  red  (degrees  of 
activation)  or  blue  (degrees  of  repression)  in  the  dendrogram.  The  ratio  of  basal 
expression  levels  of  these  71  genes  in  highly  metastatic  versus  weakly  metastatic  MDA- 
MB-231  cells  were  represented  by  a  bar  graph  in  the  right  panel,  b,  Parental  MDA-MB- 
231  cells  were  incubated  with  TGFp  for  the  indicated  times.  Total  RNA  was  subjected  to 
Northern  blot  analysis  using  IL11  and  glyceraldehyde-3-phosphate  dehydrogenase 
(GAPDH)  probes,  c,  Several  single  cell  progenies  (SCPs)  derived  from  MDA-MB-231 
were  infected  with  retroviruses  expressing  Smad4-targeting  shRNAs  or  shRNA- 
insensitive  Flag-tagged  Smad4.  Protein  expression  was  assessed  by  direct 
immunoblotting  of  total  lysates  using  the  indicated  antibodies,  d,  SCP25  and  its 
derivatives  (refer  to  Figure  3c)  were  incubated  in  the  absence  or  presence  of  TGFp  for 
2h.  Total  RNA  was  subjected  to  Northern  blot  analysis  with  indicated  probes,  e,  SCP25 
and  its  derivatives  were  treated  with  or  without  TGFp  for  24  h.  IL11  production  in  the 
media  was  determined  using  an  ELISA  assay.  Data  are  the  average  of  triplicate 
determinations  ±  S.D. 

Figure  4.  Role  of  API  and  Smad  in  the  basal  activity  and  the  TGFp  response  of  the 
IL11  promoter,  a,  Smad4-deficient  MDA-MB-468  cells  were  transfected  with  Ipg  of 
pIL1 1  (-1 00)-Luc  reporter  plasmid  21 ,  together  with  0.5pg  of  the  indicated  Smad 
expression  plasmids  35,  treated  with  or  without  TGFp,  and  analyzed  for  luciferase 
activity.  Data  are  the  average  of  triplicate  determinations  ±  S.D.  b,  Top:  Nucleotide 
sequence  of  the  minimal  TGFp  responsive  region  of  the  IL11  promoter.  Nucleotide 
sequence  positions  are  indicated  relative  to  the  transcription  start  site.  Two  API  sites 
(red  boxes)  and  a  GC-rich  sequence  (green)  containing  two  SP1  site  (green  boxes)  are 
indicated.  Bottom:  A549  and  MDA-MB-231  cells  were  transfected  with  the  indicated 
IL11  reporter  constructs,  treated  with  or  without  TGFp  for  16-20  h  prior  to  lysis,  and 
analyzed  for  luciferase  activity.  The  schematic  representation  of  each  promoter 
construct  is  shown  on  the  left.  Data  are  the  average  of  triplicate  determinations  ±  S.D.  c, 
y  P-ATP  end-labeled  probes  matching  to  the  wild-type  IL11  proximal  promoter  region, 
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this  region  with  mutant  API  sites,  or  the  indicated  fragments  of  this  region,  were 
subjected  to  electrophoretic  mobility  shift  analysis  with  recombinant  full-length  His- 
Smad4  protein.  Antibody  against  Smad4  was  added  as  indicated  to  create  super-shifts. 
The  p-actin  promoter  was  used  as  a  negative  control.  Schematic  representations  of  the 
probes  are  shown  at  the  top.  d,  MDA-MB-231  cells  were  incubated  in  the  absence  or 
presence  of  TGFp  for  2h.  Cell  lysates  were  incubated  with  biotinylated  oligonucleotides 
corresponding  to  the  indicated  IL11  promoter  probes.  DNA-bound  proteins  were 
precipitated  by  streptavidin-agarose  and  detected  by  immunoblotting.  A  mutant  c-myc 
TGFp  response  element  (mTIE)  was  used  as  a  negative  control,  e,  A549  cells  were 
incubated  with  lOOnM  TPA,  70pM  curcumim  or  no  additions  for  30  minutes,  and  then 
with  lOOpM  TGFp  for  the  indicated  period.  Total  RNA  was  subjected  to  Northern  blot 
analysis  with  the  indicated  probes,  f,  Various  MDA-MB-231  sublines  were  transfected 
with  Ipg  of  4xAP1-Luc  reporter  plasmid,  and  analyzed  for  luciferase  activity  2d  after 
transfection.  Data  are  the  average  of  triplicate  determinations  ±  S.D.  The  absolute 
values  of  IL11  mRNA  level  as  detected  by  Affymetrix  U133A  GeneChip  were  plotted  in 
the  same  graph  (yellow  circles).  The  scales  for  the  luciferase  activity  and  for  IL11 
GeneChip  expression  values  were  shown  in  the  left  and  right  sides  of  the  graph, 
respectively. 

Figure  5.  Smad4  mediation  of  breast  cancer  bone  metastasis.  Wild-type  and 
genetically  modified  SCP25  was  labeled  with  the  TGL  reporter  and  1x10s  cells  were 
injected  into  the  left  cardiac  ventricle  of  five  mice  for  each  cell  line.  At  the  indicated 
days  post-xenografting,  bioluminescence  images  were  acquired  and  quantified,  a, 
Representative  mice  from  each  group  are  shown  in  the  supine  position.  The  intensity  of 
the  signal  from  days  24  and  36  are  on  equivalent  scales,  while  day  0,  7  and  day  14  are 
each  on  separate  scales  due  to  increasing  signal  strength  and  to  avoid  signal 
saturation.  The  normalized  photon  counts  from  the  bone  metastases  in  the  hindlimbs 
were  measured  over  the  indicated  time  course  and  shown  in  b.  c,  Kaplan-Meier  curves 
showing  the  incidence  of  bone  metastasis  by  indicated  wild-type  and  Smad4- 
knockdown  MDA-MB-231  sub-lines.  105  tumor  cells  were  inoculated  into  the  left  cardiac 
ventricle  of  nude  mice.  Metastasis  was  scored  as  the  time  to  first  appearance  of  a 
visible  bone  lesion  by  X-ray  imaging  of  the  whole  mouse.  The  percent  of  animals  in 
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each  group  that  were  free  of  detectable  bone  metastases  is  plotted,  d,  106  tumor  cells 
were  injected  subcutaneously  into  nude  mice.  Subcutaneous  tumor  growth  was 
monitored  and  quantified  by  caliper  measurements.  No  significant  difference  was  found 
between  wild-type  and  Smad4-knowdown  cells. 


Supplementary  Table  1.  Summary  of  TGFp  target  genes  in  three  normal  human 
epithelial  cell  lines  and  MDA-MB-231  breast  cancer  cells. 

Shown  are  gene  responses  observed  in  at  least  two  out  of  the  three  cell  lines  derived 
from  normal  tissues  (HaCaT  keratinocytes,  MCF10A  mammary  epithelial  cells  and 
HPL1  lung  epithelial  cells) 23,  and  the  response  of  these  genes  in  MDA-MB-231  cells.  I: 
signal  increased  by  TGFp  by  more  than  2-fold;  D:  signal  decreased  by  TGFp  by  more 
than  2-fold. 


Supplementary  Table  2.  Basal  level  of  TGFp  epithelial  cell  target  genes  in  various 
MDA-MB-231  sub-lines  of  different  bone  metastatic  activity. 

The  list  of  genes  examined  corresponds  to  genes  whose  expression  was  increased  or 
decreased  in  response  to  TGFp  in  at  least  two  our  of  three  cell  lines  (HaCaT 
keratinocytes,  MCF10A  mammary  epithelial  cells  and  HPL1  lung  epithelial  cells)  derived 
from  normal  tissue  (3E  TGFp  response  signature)  23.  The  two  genes  whose  basal 
expression  level  was  >3-fold  higher  in  highly  bone-metastatic  cells  compared  to  poorly 
metastatic  cells  are  highlighted. 
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Supplementary  Table  1. 


Probe  Set 

Description 

HaCaT 

Cell  Lines 
MCF-10A 

HPL1 

MDA231 

201170  s  at 

Basic  helix-loop-helix  domain  containing,  class  B,  2 

1 

1 

1 

i 

201329  s  at 

V-ets  avian  erythroblastosis  virus  E26  oncogene  homolog  2 

i 

1 

201389  at 

Integrin,  alpha  5  (fibronectin  receptor,  alpha  polypeptide) 

1 

1 

1 

201416  at 

SRY  (sex  determining  region  Y)-box  4 

1 

1 

201466  s  at 

c-jun 

1 

1 

1 

201473  at 

Jun  B  proto-oncogene 

1 

1 

i 

201739  at 

Serum/glucocorticoid  regulated  kinase 

1 

1 

i 

202149  at 

Enhancer  of  filamentation  (HEF1) 

i 

1 

1 

i 

202150  s  at 

Enhancer  of  filamentation  1  (cas-like  docking;  Crk-associated  substrate  related) 

i 

1 

1 

i 

202284  s  at 

p21Cip1 

1 

1 

1 

202628  s  at 

plasminogen  activator  inhibitor  type  1 

1 

1 

1 

1 

202672  s  at 

activating  transcription  factor  3  (ATF3) 

i 

1 

203592  s  at 

Follistatin-like  3  (secreted  glycoprotein) 

1 

1 

i 

204255  s  at 

Vitamin  D  (1,25-  dihydroxyvitamin  D3)  receptor 

1 

1 

i 

204790  at 

Smad7 

1 

1 

1 

1 

205330  at 

Meningioma  (disrupted  in  balanced  translocation)  1 ,  MN1 

1 

1 

1 

205387  s  at 

Chorionic  gonadotropin,  beta  polypeptide 

1 

1 

1 

i 

205479  s  at 

Plasminogen  activator,  urokinase 

1 

1 

i 

205596  s  at 

E3  ubiquitin  ligase  Smurf2 

1 

1 

205807  s  at 

Tuftelin  1 

1 

1 

i 

^  206277  at 

Purinergic  receptor  P2Y,  G-protein  coupled,  2 

1 

1 

1 

i 

O  206675  s  at 

Sno 

1 

1 

1 

i 

206924  at 

Interleukin  1 1 

1 

1 

i 

207147  at 

Distal-less  homeo  box  2 

1 

1 

1 

i 

1  207530  s  at 

p15lnk4b 

1 

1 

i 

|  207574  s  at 

Growth  arrest  and  DNA-damage-inducible,  beta 

1 

1 

i 

|  208083  s  at 

integrin,  beta  6  (ITGB6) 

1 

1 

i 

«  208322  s  at 

Sialyltransferase  4A  (beta-galactosidase  alpha-2, 3-sialytransferase) 

1 

1 

g  209098_s_at 

Jagged  1  (Alagille  syndrome) 

1 

1 

i 

“209101  at 

Connective  tissue  growth  factor 

1 

1 

1 

1 

209193  at 

Pim-1  oncogene 

1 

1 

209681  at 

Solute  earner  family  19  (thiamine  transporter),  member  2 

1 

1 

i 

209706  at 

NK  homeobox  (Drosophila),  family  3,  A 

1 

1 

i 

209765  at 

A  disintegrin  and  metalloproteinase  domain  19  (meltrin  beta) 

1 

1 

i 

210214  s  at 

Bone  morphogenetic  protein  receptor,  type  II  (serine/threonine  kinase) 

1 

1 

i 

210999  s  at 

Growth  factor  receptor-bound  protein  10 

1 

1 

211165  x  at 

EphB2 

i 

1 

1 

211527  x  at 

Vascular  endothelial  growth  factor 

i 

1 

1 

1 

211981  at 

Collagen,  type  IV,  alpha  1 

1 

1 

212666  at 

E3  ubiquitin  ligase  Smurfl 

1 

1 

213039  at 

Rho-specific  guanine  nucleotide  exchange  factor  pi  14 

i 

1 

216199  s  at 

mitogen-activated  protein  kinase  kinase  kinase  4 

1 

1 

216268  s  at 

Jagged  1  (Alagille  syndrome) 

i 

1 

i 

217227  x  at 

Immunoglobulin  lambda  locus 

1 

1 

217875  s  at 

Transmembrane,  prostate  androgen  induced  RNA 

i 

1 

i 

219257  s  at 

Sphingosine  kinase  1 

1 

l 

1 

219682  s  at 

T-box  3  (ulnar  mammary  syndrome) 

1 

1 

219825  at 

Cytochrome  P450  retinoid  metabolizing  protein 

1 

1 

221009  s  at 

Angiopoietin-like  4 

1 

1 

1 

i 

221029  s  at 

Winqless-type  MMTV  integration  site  family,  member  5B 

1 

1 

201008  s  at 

Thioredoxin  interacting  protein 

D 

D 

D 

201010  s  at 
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Abstract 


Little  is  known  of  the  underlying  biology  of  estrogen  receptor-negative,  progesterone 
receptor-negative  [ER(-)/PR(-)]breast  cancer  (BC)  and  few  targeted  therapies  are 
available.  Clinical  heterogeneity  of  ER(-)/PR(-)  tumors  suggests  that  molecular 
characterization  may  provide  insight  into  their  biology,  reveal  distinct  subsets  and 
identify  new  therapeutic  targets.  We  performed  genome-wide  expression  analysis  of  99 
primary  BC  samples  and  8  BC  cell  lines  and  identified  a  subset  of  ER(-)/PR(-)  tumors 
with  expression  of  genes  known  to  be  either  direct  targets  of  ER,  responsive  to  estrogen, 
or  differentially  expressed  in  ER(+)  BC.  Differentially  expressed  genes  included  SPDEF, 
FOXA1,  XBP1,  CYB5,  TFF3,  NAT1,  APOD,  ALCAM  and  AR  (p<0.001).  A 
classification  model  based  on  the  expression  signature  of  this  tumor  class  identified 
molecularly  similar  breast  cancers  in  an  independent  human  breast  cancer  data  set  and 
among  breast  cancer  cell  lines  (MDA-MB-453).  This  cell  line  demonstrated  a 
proliferative  response  to  androgen  in  an  androgen  receptor  dependent  and  estrogen 
receptor  independent  manner.  In  addition  the  androgen  dependent  transcriptional 
program  of  MDA-MB-453  significantly  overlapped  the  molecular  signature  of  the  unique 
ER(-)/PR(-)  subclass  of  human  tumors.  This  subset  of  breast  cancers,  characterized  by  a 
hormonally  related  transcriptional  program  and  proliferative  response  to  androgen, 
suggest  the  potential  for  therapeutic  strategies  targeting  the  androgen  signaling  pathway. 
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Introduction 


Breast  cancer  remains  a  major  public  health  concern  in  the  United  States  and  is 
the  second  highest  cause  of  cancer  death  in  women.  It  is  estimated  that  in  2005  over 
200,000  women  will  develop  breast  cancer  and  40,400  will  die  of  their  disease  (1).  The 
estrogen  receptor  (ER)  regulates  growth  and  differentiation  of  the  normal  mammary 
gland  and  is  important  in  the  development  and  progression  of  about  70%  of  breast  cancer. 
Like  other  steroid-hormone  receptors,  the  ER  mediates  its  downstream  effects  by  direct 
transcriptional  regulation  of  target  genes.  On  ligand  binding,  the  receptor  dissociates 
from  its  cytoplasmic  chaperones,  translocates  to  the  nucleus,  binds  to  specific  DNA 
sequences  called  estrogen-response  elements  (ERE)  and  initiates  gene  transcription  (2). 
Associated  co-regulatory  proteins  either  activate  or  repress  ER  transcriptional  activity 
(3).  In  recent  years  alternative  ER  signaling  via  direct  association  with  and  activation  of 
many  signal  transduction  pathways  has  been  described  (4,  5).  For  several  decades, 
targeting  the  ER  has  been  the  cornerstone  in  treatment  for  ER-positive  [ER(+)J  breast 
cancer.  Estrogen  deprivation  therapy  may  be  achieved  by  oopherectomy,  selective 
estrogen  receptor  modulators  such  as  tamoxifen,  and  more  recently  by  the  use  of  third 
generation  aromatase  inhibitors  (6)  and  direct  estrogen  receptor  antagonists  (7-11). 

ER-negative,  progesterone  receptor-negative  [ER(-)/PR(-)]  breast  cancer 
represents  approximately  25  to  30%  of  all  breast  cancers  and  generally  has  a  more 
aggressive  clinical  course.  In  contrast  to  ER(+)  breast  cancer,  patients  with  ER(-)/PR(-) 
tumors  derive  little  or  no  benefit  from  anti-estrogen  therapy  (10)  and  targeted  therapies 
remain  elusive  (12).  One  notable  exception  has  been  the  successful  use  of  antibodies 
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targeting  the  tyrosine  kinase  receptor  HER-2-neu  (ERBB2)  (12).  Although  ERBB2  can 
be  over  expressed  in  both  ER(+)  and  ER(-)  breast  cancer,  it  tends  to  be  disproportionably 
found  in  ER(-)  breast  cancer  (13). 

In  addition  to  ER,  breast  cancer  cells  express  other  nuclear  hormone  receptors. 

For  example  the  androgen  receptor  (AR)  is  expressed  in  60-80%  of  breast  cancers  and 
implicated  in  breast  cancer  biology  (14).  Recent  studies  have  reported  that  among 
postmenopausal  women,  high  androgen  levels  are  associated  with  an  increased  risk  of 
developing  breast  cancer  (15).  Furthermore,  androgens  can  induce  proliferation  in  breast 
tissue,  and  initiate  tumor  formation  via  the  AR  in  animal  models  (16).  The  mechanisms 
by  which  AR  contribute  to  the  initiation  and  progression  of  breast  cancer  and  its 
functional  relationship  to  the  ER  are  unknown.  It  also  remains  to  be  determined  if 
targeting  the  AR  could  extend  the  benefits  of  hormonal  therapy  to  women  with  ER(- 
)/PR(-),  AR-positive  breast  cancer. 

Genome-wide  transcript  analysis  using  DNA  microarray  technology  is  an 
important  and  well-established  new  tool  in  the  study  of  human  disease.  The  technology 
allows  the  measurement  of  several  thousands  of  mRNA  species  simultaneously.  The 
resulting  gene  expression  profiles  can  distinguish  tumor  classes  not  evident  by  traditional 
methods  (17,  18).  In  breast  cancer,  DNA  microarray  analysis  has  demonstrated  that 
ER(+)  breast  cancer  and  ER(-)/PR(-)  disease  have  unique  molecular  profiles,  identified 
several  distinct  molecular  subclasses  and  been  used  to  predict  disease  recurrence  (19-23). 
Few  reports  specifically  focus  on  gene  expression  analysis  of  ER(-)/PR(-)  breast  cancers 
and  are  limited  by  small  sample  size  (24).  We  report  the  identification  and 
characterization  of  a  unique  ER(-)/PR(-)  breast  cancer  subset  with  a  hormonally  regulated 
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gene  expression  signature  and  AR-dependent,  androgen  induced  cell  growth  in  culture. 
This  represents  a  clinically  relevant  subset  of  ER(-)/PR(-)  breast  cancer  for  which  AR 
may  provide  a  useful  therapeutic  target. 

Methods 


Samples  and  Gene  Expression  Analysis 

Tissue  samples  were  obtained  from  therapeutic  or  diagnostic  procedures 
performed  as  part  of  routine  clinical  management  at  Memorial  Sloan-Kettering  Cancer 
Center.  All  research  procedures  using  human  tissue  were  approved  by  the  MSKCC 
institutional  review  board.  Tissues  were  snap  frozen  in  liquid  nitrogen  and  stored  at  - 
80°C.  Each  sample  was  examined  histologically  using  hemotoxylin  and  eosin-stained 
cryostat  sections  and  enriched  for  areas  of  interest  by  manual  trimming  of  tissue  blocks. 
Total  RNA  was  extracted  from  frozen  tissue  by  homogenization  in  guanidinium 
isothiocyanate-based  buffer  (Trizol;  Invitrogen,  Carlsbad,  CA),  purified  using  RNAeasy 
(Qiagen,  Valencia,  CA)  and  examined  for  quality  using  denaturing  agarose  gel. 
Complementary  DNA  was  synthesized  from  RNA  using  a  T7-promoter-tagged  oligo-dT 
primer.  RNA  target  was  synthesized  from  cDNA  by  in  vitro  transcription,  and  labeled 
with  biotinylated  nucleotides  (Enzo  Biochem,  Farmingdale,  NY)  (25).  Gene  expression 
analysis  was  performed  using  HG-U133A  oligonucleotide  microarrays  according  to  the 
manufacturer’s  instructions  (Affymetrix,  Santa  Clara,  Ca).  There  were  99  primary  breast 
tumors  analyzed  (77  invasive  ductal  carcinoma,  10  invasive  lobular  carcinomas,  7  mixed 
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lobular  and  ductal  carcinomas,  4  metaplastic  carcinomas  and  one  not  specified).  All  ER(- 
)/PR(-)  tumors  were  designated  invasive  ductal  or  invasive  lobular  type. 

Data  Analysis 

Signals  were  quantified  using  Affymetrix  Microarray  Suite  5.0  and  expression 
values  were  scaled  to  have  a  mean  expression  of  500  across  the  central  96%  of  values  for 
each  array.  Each  sample  was  individually  characterized  by  both  probe  set  intensity 
values  and  associated  clinical  data.  A  master  gene  table  was  compiled,  in  which  specific 
genes  represented  by  GenBank  accession  numbers  were  identified  for  each  probe  set 
(http://www.affymetrix.com).  Annotation  information  corresponding  to  the  GenBank 
accession  number  for  each  probe  set  was  retrieved  from  the  GenBank,  LocusLink, 
Unigene,  and  Gene  Ontology  Consortium  databases.  All  annotation  information  was 
downloaded  through  the  Silicon  Genetics  Mirror  server  using  the  GeneSpider  tool 
(GeneSpring,  Silicon  Genetics,  Redwood  City,  CA). 

Prior  to  unsupervised  analyses,  the  gene  expression  measurements  were  filtered 
and  normalized  using  the  following  methods.  We  included  probe  sets  that  varied  the 
most  across  samples.  Additionally,  a  probe  set  was  included  only  if  >10%  of  its 
measurements  exceeded  the  per-chip  mean  of  500.  For  each  array,  probe  set  values  were 
log2  transformed  and  centered  to  median=0.  Normalization  was  performed  so  that  all 
measurements  for  that  array  were  multiplied  by  a  scaling  factor  S  such  that  the  sum  of  the 
squares  of  the  values  equaled  1 .  Each  probe  set  measurement  was  centered  and 
normalized  across  samples  according  to  the  same  procedure.  Filtering  and  normalization 
were  performed  independently  for  each  analysis. 
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Two-way  unsupervised  hierarchical  clustering  was  performed  using  the  software 
Cluster  3.0  (26)  and  Genespring.  To  cluster  data,  we  used  an  uncentered  standard 
correlation  (Pearson  correlation  around  zero)  as  our  measure  of  similarity.  In 
constructing  dendograms,  centroid  linkage  was  used  as  the  measure  of  proximity  between 
clusters,  rincipal  component  analysis  (PCA)  was  performed  using  Genespring.  Principal 
components  were  calculated  for  a  designated  set  of  genes  and  samples,  and  the  three 
principal  components  representing  the  greatest  variance  in  expression  were  plotted  in 
order  to  visualize  samples  in  three  dimensional  gene  expression  space. 

To  identify  differentially  expressed  genes  between  two  groups,  we  used  two 
different  measures;  fold  change  (ratio)  between  the  normalized  means  of  each  ER(-)  class 
and  a  student’s  t-test.  For  gene  expression  data  generated  from  cultured  cells  exposed  to 
different  treatments,  the  data  was  filtered  to  include  only  probe  sets  with  an  absolute 
expression  value  greater  than  200  in  at  least  one  condition  and  differential  expression  was 
evaluated  by  fold  change  between  different  conditions. 

Immunohistochemistry  (IHC) 

Immunohistochemical  detection  was  performed  using  streptavidin-biotin- 
peroxidase  and  microwave  antigen  retrieval  methodology  as  described  (25).  Tissue 
blocks  with  multiple  samples  were  prepared  using  a  tissue  arrayer  (Beecher  Instruments, 
Sun  Prairie,  WI).  For  each  sample,  three  0.6mm  core  sections  of  tissue  were  extracted 
from  diagnostic  areas  of  formalin-fixed,  paraffin-embedded  tissues.  We  defined  Her2 
positivity  as  3+  by  IHC,  or  2+  by  IHC  with  gene  amplification.  For  ER,  PR,  AR,  and 
ERp,  samples  were  considered  positive  if  greater  than  10%  of  cell  nuclei  were 
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immunoreactive.  Semi-quantitative  analysis  of  ER  expression  was  performed  using 
whole  sections  obtained  from  the  original  paraffin-embedded  tissue  samples.  Signal 
intensity  was  graded  on  a  scale  of  0-3.  A  final  IHC  score  was  computed  by  multiplying 
the  percent  of  positive  nuclei  by  the  intensity. 

Cell  Culture 

The  breast  cancer  cell  lines  MDA-MB-453,  MDA-MB-231,  SKBR-3,  HCC-1937, 
ZR75-1,  MCF7,  BT-474  and  T-47D  were  obtained  from  American  Type  Culture 
Collection  (http://www.atcc.org).  Cells  were  maintained  at  37°  C  in  a  humidified 
atmosphere  containing  5%  CO  2,  in  75  cm2  flasks  containing  Minimal  Essential  Medium 
(MEM)  supplemented  with  10%  fetal  bovine  serum,  2%  1-glutamine,  NEAA,  ImM 
sodium  pyruvate,  1.5g/L  sodium  bicarbonate,  100 1.U./ml  penicillin  and  100pg/ml 
streptomycin.  Cells  were  passaged  every  3-4  days  when  they  reached  80%  confluence, 
and  harvested  with  0.25%  trypsin/EDTA. 

For  cell  proliferation  studies,  cells  were  pelleted  by  centrifugation  and 
resuspended  in  medium  containing  phenol  red-free  MEM  supplemented  with  10% 
charcoal-stripped  fetal  bovine  serum  (CSFBS)  (Hyclone,  Logan,  UT),  2%  1-glutamine, 
NEAA,  ImM  sodium  pyruvate,  and  1.5g/L  sodium  bicarbonate.  Cells  were  plated  in 
replicates  of  6  at  a  density  of  lxl 04  cells/well  in  96  well  microtiter  plates.  24  hours  after 
seeding,  cells  were  treated  with  various  reagents  and  media  and  reagents  were 
replenished  every  3  days.  Reagents  used  were  10  nM  E2  (Sigma-Aldrich,  St.  Louis, 
MO),  0.1-10  nM  R-1881  (Sigma),  10  pM  flutamide  (Sigma),  100  nM  4-OHT  (tamoxifen) 
(Sigma),  and  100  nM  antiestrogen  ICI  182780  (fulvestrant,  ICI)  (Tocris,  Ellisville,  MO). 
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Cell  viability  and  proliferation  were  measured  using  the  3-(4,5  dimethylthiazol-2-yl)-2,5- 
diphenyl  tetrazolium  bromide  (MTT)  colorimetric  assay  (American  Type  Culture 
Collection,  Rockville,  MD)  (27)  and  quantified  by  measuring  absorbance  at  570nm 
(Victor  V7  microplate  reader, Perkin  Elmer,  Wellesly,  MA). 

Genome-wide  expression  profiling  was  performed  for  MDA-MB-453  cells  in  six 
experimental  conditions  that  included  incubation  with  combinations  of  androgen,  AR 
antagonist,  and  vehicle  control.  The  six  expression  time  course  experiments,  referred  to 
as  experiments  I  thru  VI,  were  performed  simultaneously.  Cells  were  grown  to 
confluence  in  one  125cm2  flask,  trypsinized,  resuspended  and  seeded  in  six  75cm2  flasks 
at  a  density  of  1  x  106  cells  per  flask.  Cells  were  then  incubated  in  media  containing  10% 
FBS  until  60%  confluence,  washed  with  ice  cold  PBS  and  treated  with  media  and 
reagents  according  to  the  six  experimental  conditions.  Experiment  I  incubated  cells  in 
media  containing  10%  FBS;  Experiment  II  used  charcoal  stripped  media  supplemented 
with  vehicle  control;  Experiment  III  used  stripped  media  with  InM  R-1881;  Experiment 
IV  used  stripped  media  with  InM  R-1881  and  lOpM  flutamide.  For  I-IV  RNA  was 
extracted  after  48  hours.  In  experiments  V  and  VI,  cells  were  incubated  in  stripped  media 
for  48  hours  then  exposed  to  either  InM  R-1881  (V)  or  vehicle  control  (VI)  for  48  hours 
followed  by  RNA  extraction. 

Identification  of  ERE  and  ARE  Motifs 

For  each  probe  set,  GenBank  accession  numbers  identified  specific  genes.  9999 
bp  of  sequence  5’  to  the  start  of  the  transcription  site  was  retrieved  for  all  genes  from  the 
ENSEMBL  database  using  build  NCBI  34  (Version  2),  updated  February  2004,  from  the 
Silicon  Genetics  website  (http://www.silicongenetics.com/Downloads/HumanGenome9999.zip).  For 
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genes  of  interest,  sequence  within  1  to  5,000  bp  upstream  of  the  transcription  site  was 
analyzed  for  homology  to  the  ERE  consensus  5’-GGTCAnnnTGACC-3’  and  the  ARE 
consensus  5’-AGAACAnnnTGTTCT-3’.  We  allowed  for  two  single  point  discrepancies 
in  each  sequence  homology  analysis.  For  genes  identified  as  having  putative  regulatory 
sequences,  a  false  positive  probability  was  estimated  by  observing  both  the  frequency  of 
the  regulatory  sequence  upstream  of  all  other  genes,  and  the  frequency  of  the  regulatory 
sequence  within  a  random  distribution  of  bases.  In  the  latter  case,  the  percent  occurrence 
of  each  base  in  the  random  distribution  is  set  to  equal  the  percent  occurrence  of  each  base 
within  the  sequence  in  question.  Genes  with  homologous  response  elements  were 
reported  if  the  higher  p-value  obtained  from  these  two  observations  was  less  than  0.0001. 

Class  Prediction 

A  prediction  algorithm  was  developed  in  order  to  identify  samples  which 
expressed  a  relevant  gene  signature.  Tissue  samples  were  assigned  to  a  subclass  based  on 
our  unsupervised  hierarchical  clustering  of  ER(-)/PR(-)  tumors.  Differentially  expressed 
genes  between  the  two  clusters  (designated  classes  A  and  B)  were  ranked  by  student’s  t- 
test  and  those  with  a  p-value  <0.0001  were  selected  for  use  in  the  prediction  model.  The 
expression  of  each  predictor  gene  was  used  to  classify  unknown  samples  using  the  k- 
nearest  neighbors  method  (18).  Based  on  normalized  expression  values,  we  examined  1 1 
samples  near  (as  measured  in  Euclidian  Distance)  the  unclassified  samples,  and  for  each 
class,  computed  a  p-value  of  the  likelihood  of  finding  the  observed  number  of  this  class 
among  the  identified  neighborhood  members  by  chance,  given  the  proportion  of  class 
membership  in  the  training  set.  The  class  with  the  lowest  p-value  was  assigned  to  the 
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unclassified  sample.  We  specified  a  p- value  cutoff  of  0.15,  so  that  if  there  was  not 
sufficient  evidence  in  favor  of  a  particular  class,  no  prediction  was  made.  The  p-value 
cutoff  is  a  ratio  of  the  p-value  of  the  predicted  class  to  the  alternate  class. 

Results 

Molecular  heterogeneity  of  ER(-)/PR(-)  breast  cancers  demonstrated  by  genome¬ 
wide  expression  analysis 

In  order  to  explore  the  molecular  heterogeneity  of  breast  cancers  we  performed 
genome-wide  transcript  profiling  for  99  primary  breast  carcinomas  using  oligonucleotide 
microarrays.  In  all  cases  we  performed  immunohistochemical  assessment  of  ER  and  PR 
to  ensure  the  accuracy  of  receptor  status  and  determine  heterogeneity.  Forty-one  tumors 
were  ER(-)/PR(-),  2  were  ER(-)/PR(+),  and  56  were  ER(+).  As  a  further  evaluation  of 
correspondence  between  the  transcript  level  for  ER  determined  by  microarray  and  ER 
protein  expression,  we  developed  a  semiquantitative  IHC  score  for  ER.  We  compared 
this  protein  expression  score  with  the  mRNA  level  according  to  the  ESR1  probe  set 
intensity,  and  observed  a  strong  positive  correlation  (spearman  rho=  0.834,  p<0.01) 
between  ER  protein  and  transcript  levels.  Unsupervised  hierarchical  clustering  revealed  a 
strong  association  between  ER  status  and  molecular  profile  as  previously  reported  (22). 
However,  9  ER(-)/PR(-)  breast  cancers  were  grouped  with  the  ER(+)  tumors  and  3  ER(+) 
samples  were  grouped  with  ER(-)/PR(-)  breast  cancers  (fig  1).  The  finding  of  breast 
cancers  molecularly  discordant  with  ER  status  suggested  heterogeneity  within  the  major 
breast  cancer  subtypes  and  was  further  explored. 
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We  focused  our  studies  on  ER(-)/PR(-)  breast  cancers  and  performed 
unsupervised  hierarchical  clustering  limited  to  the  41  ER(-)/PR(-)tumors.  Of  the  major 
clusters  in  the  dendrogram,  it  was  of  particular  interest  that  the  9  ER-discordant  samples 
in  the  previous  analysis  were  all  closely  correlated  and  contained  in  a  single  cluster  with 
only  one  additional  case  (fig  2A).  To  evaluate  the  reproducibility  of  the  molecular 
subgroups  we  carried  out  a  principal  component  analysis  and  identified  the  three 
components  representing  the  greatest  variance  in  gene  expression  for  the  41  ER(-)/PR(-) 
samples.  Using  the  principal  components  to  plot  samples  in  three  dimensions,  these  same 
10  samples  were  distinct  from  the  other  ER(-)/PR(-)  samples,  demonstrating  relatively 
robust  molecular  phenotypes  (fig  2B).  Therefore  within  our  sample  set  of  ER(-)/PR(-) 
breast  cancers  we  detected  two  major  molecular  subdivisions:  one  composed  of  10 
samples  with  a  molecular  resemblance  to  ER(+)  breast  cancer  (referred  to  hereafter  as 
ER(-)  class  A)  and  another  composed  of  the  remaining  31  breast  cancers  (ER(-)  class  B). 

Characterization  of  genes  differentially  expressed  in  ER(-)/PR(-)  breast  cancer 
subtypes 

By  visual  inspection  of  two  dimensional  cluster  diagrams  it  was  evident  that  a 
number  of  gene  clusters  corresponding  to  differential  expression  in  ER(-)  class  A 
relative  to  other  ER(-)/PR(-)  breast  cancers  are  associated  with  ER(+)  tumors  (figure  1). 
These  initial  observations  suggested  that  ER(-)  class  A  tumors  expressed  a  molecular 
signature  common  to  ER(+)  breast  cancers  and  warranted  further  investigation.  We  first 
identified  202  genes  markedly  differentially  expressed  according  to  ER  status  (at  least 
three  fold  difference  between  the  means  of  ER(+)  and  ER(-)/PR(-)  cases  and  a  student’s 
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t-test  pO.OOOl).  One  hundred  forty  five  genes  were  over  expressed  in  ER(+)  breast 
cancers  and  57  were  over  expressed  in  ER(-)/PR(-)  tumors  (supplementary  tables  2a  and 
2b).  Not  surprisingly  many  of  the  differentially  expressed  genes  have  been  identified  in 
previous  similar  analyses  (21,  22).  We  then  identified  142  genes  significantly 
differentially  expressed  between  ER(-)/PR(-)  class  A  and  class  B  samples.  Ninety-six 
genes  were  over  expressed  and  46  genes  were  under  expressed  relative  to  class  A.  Of  the 
96  genes  differentially  over  expressed  in  class  A,  12  have  been  reported  as 
experimentally  valid  direct  targets  of  the  ER  (28),  12  were  responsive  to  estrogen  in 
previous  genome  wide  molecular  studies  (29)  and  24  were  differentially  over  expressed 
in  ER(+)  tumors  compared  to  all  ER(-)/PR(-)  tumors  in  our  data  (table  1).  In  addition,  we 
searched  5Kb  of  DNA  sequence  5’  upstream  of  the  transcription  start  site  and  found  that 
24  of  the  genes  over  expressed  in  class  A  had  promoter  regions  containing  at  least  one 
putative  ERE,  and  12  had  promoter  regions  with  at  least  one  putative  ARE.  Among  the 
46  genes  under  expressed  in  ER(-)class  A,  3  genes  have  been  identified  as  experimental 
targets  of  the  ER  (28),  4  genes  had  promoter  regions  containing  at  least  one  putative 
ERE,  and  2  genes  had  promoter  regions  containing  at  least  one  putative  ARE.  In 
addition,  5  genes  were  differentially  over  expressed  among  all  ER(-)/PR(-)  tumors 
compared  to  ER(+)  tumors  in  our  data  (table  1).  These  observations  suggested  that  ER(-) 
class  A  samples  more  closely  resembled  an  ER(+)  breast  cancer  molecular  phenotype  due 
to  expression  of  many  genes  believed  to  be  hormonally  regulated  based  on  data  from 
several  lines  of  investigation. 

In  order  to  further  evaluate  this  finding  we  obtained  unigene  id  numbers  for  386 
estrogen  responsive  genes  identified  in  a  previously  published  genome  wide  expression 
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analysis  using  an  experimental  platform  different  from  that  used  in  our  study  (29),  and 
identified  508  corresponding  affymetrix  probe  sets  using  the  Netaffyx  batch  query  tool 
(www.affymetrix.com).  Supervised  cluster  analysis  limited  to  this  set  of  genes  tended  to 
group  ER(-)  class  A  samples  between  ER(+)  samples  and  the  remaining  ER(-)/PR(-) 
tumors  (supplemental  figure  1).  This  was  also  true  for  the  ER(-)  class  A  cell  line 
described  below.  This  provides  further  evidence  that  ER(-)  class  A  breast  cancers  were 
characterized  by  expression  of  estrogen  associated  gene  profiles  that  are  similar  to  those 
of  ER(+)  tumors. 

Immunohistochemical  analysis  of  gene  transcript  differences  between  ER(-)/PR(-) 
breast  cancer  subtypes 

In  order  to  further  evaluate  and  validate  the  molecular  differences  identified  by 
the  genome-wide  expression  analysis  using  alternative  techniques,  we  performed  IHC  for 
several  genes  differentially  expressed  between  ER(-)  class  A  and  ER(-)  class  B.  A 
significant  proportion  of  ER(-)  class  A  samples  were  immunoreactive  for  the  AR  and 
FOXA1  compared  to  ER(-)  class  B  samples  (p=0.045  and  p=0.013  respectively,  Fisher’s 
exact  test)  in  concordance  with  the  transcript  levels.  Protein  expression  of  ALCAM  and 
SPDEF  were  analyzed  on  a  continuous  scale  using  an  IHC  score  (percentage  of  cells 
staining  times  intensity).  There  was  a  significant  correlation  between  protein  and 
transcript  expression  for  ALCAM  (spearman  rho=  0.55,  p=0.0002)  and  significant 
differential  protein  expression  between  ER(-)  class  A  compared  to  ER(-)  class  B  samples 
(p=0.023,  Mann- Whitney  test).  Nuclear  expression  of  SPDEF  was  significantly  greater 
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for  ER(-)  class  A  samples  compared  to  ER(-)  class  B  samples  (pO.OOOl,  Mann- Whitney 
test)  (figure  3). 

We  also  evaluated  the  breast  cancer  marker  ERBB2  using  IHC  and  FISH.  The 
proportion  of  ERBB2  positive  samples  for  ER(-)  class  A  and  ER(-)  class  B  were  0.30  and 
0.15  respectively,  and  was  in  good  agreement  with  the  ERBB2  transcript  levels.  IHC  was 
also  used  to  evaluate  the  expression  of  ER|3.  Luminal  epithelial  cells  of  normal  breast 
expressed  moderate  levels  of  ER|3,  however  there  was  little  to  no  ERp  protein  expression 
detected  in  the  ER(-)/PR(-)  samples. 

Because  several  genes  differentially  expressed  in  ER(-)  class  A  were  identifiable 
at  the  protein  level  by  IHC  in  FFPE  tissue  sections,  it  may  be  feasible  to  develop  a 
combination  of  IHC  markers  for  routine  clinical  identification  of  ER(-)  class  A  breast 
cancers.  A  combination  of  SPDEF  and  ALCAM  was  estimated  to  predict  ER(-)  class  A 
with  a  sensitivity  approaching  100%  (95%  C.I.  69  to  100%),  and  a  specificity  of  94% 
(95%  C.I.  79  to  99%).  It  is  important  to  note  that  this  analysis  is  limited  by  sample  size 
and  the  development  of  an  IHC  assay  for  routine  clinical  assessment  deserves  further 
study. 

Class  prediction  and  independent  evaluation  of  ER(-)  breast  cancer  subsets 

In  order  to  determine  if  the  ER(-)  class  A  subclass  was  a  reproducible  finding  and 
identity  appropriate  breast  cancer  cell  lines  for  further  study,  we  developed  a  k-nearest 
neighbor  classification  model  using  179  genes  that  were  differentially  expressed  (p-value 
<  0.0001)  between  ER(-)  class  A  and  all  other  ER(-)/PR(-)  tumors.  We  applied  this 
classification  method  to  an  independent,  publicly  available  breast  cancer  gene  expression 
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data  set  that  used  the  same  analytical  platform  (30).  A  similar  proportion  of  ER(-)/PR(-) 
samples  was  classified  as  ER(-)  class  A  in  this  independent  data  as  in  our  samples  (32% 
of  ER(-)/PR(-)  tumors  vs.  24%).  Further  analysis  confirmed  that  a  number  of  genes 
differentially  expressed  in  the  comparison  of  ER(-)  class  A  and  ER(-)  class  B  in  our 
original  data  were  also  differentially  expressed  in  the  independent  predicted  subsets.  AR, 
CYB5,  XBP1,  FOXA1  and  SPDEF,  as  well  as  the  androgen  responsive  genes  APOD  (31) 
and  PIP  (32),  were  among  the  top  50  significantly  over  expressed  genes  in  the  predicted 
class  A  (p<le-10).  It  is  interesting  to  note  that  ERBB2  and  FGFR4  were  also  among  the 
top  ranked  genes,  highly  over  expressed  in  the  predicted  ER(-)  class  A  (p=1.6e-14  and 
p=1.4e-08  respectively)  of  the  independent  data.  Although  ERBB2  was  preferentially 
expressed  in  ER(-)  class  A  in  our  original  data  (p=0.00125),  the  highly  significant 
difference  in  absolute  expression  in  the  larger  predicted  ER(-)  class  A  more  strongly 
suggested  that  ERBB2  may  be  an  important  factor  in  the  molecular  phenotype. 

Not  only  was  the  ER(-)  class  A  clearly  distinguishable  in  the  independent  data  by 
supervised  analysis,  but  unsupervised  approaches  indicated  that  these  classes  represent  a 
primary  distinction  among  ER(-)/PR(-)  tumors.  An  unsupervised  hierarchical  clustering 
of  the  77  ER(-)/PR(-)  tumors  yielded  primary  groups  of  samples  which  corresponded 
very  closely  to  the  class  prediction  assignments  by  the  predictive  model  (fig  4).  This 
provided  further  evidence  that  the  ER(-)  class  A  and  B  distinction  was  reproducible  and 
intrinsic  to  the  primary  molecular  substructure  of  ER(-)/PR(-)  tumors. 

We  then  used  the  prediction  model  to  evaluate  breast  cancer  cell  lines  in  order  to 
identify  ER(-)/PR(-)  cell  lines  corresponding  to  the  ER(-)  class  A  molecular  phenotype. 
Expression  profiles  were  generated  for  the  ER(-)/PR(-)  cell  lines  MDA-MB-231,  MDA- 
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MB-453,  HCC-1937,  and  SKBR-3.  These  cell  lines  have  been  described  to  represent 
important  distinctions  within  the  spectrum  of  ER(-)/PR(-)  disease  (33).  Our  classification 
model  identified  the  cell  line  MDA-MB-453  as  ER(-)  class  A  (p  value  ratio  =  5.75e-06), 
and  the  remaining  ER(-)/PR(-)  cell  lines  as  ER(-)  class  B.  We  have  therefore  used  MDA- 
MB-453  as  an  in  vitro  model  representing  the  ER(-)  class  A  molecular  phenotype. 

The  ER(-)  class  A  cell  line  MDA-MB-453  shows  a  proliferative  response  to  androgen 
that  is  AR-dependent  and  ER-independent. 

The  identification  of  ER(-)/PR(-)  breast  tumors  characterized  by  expression 
profiles  including  estrogen  regulated  genes  suggested  an  ER-independent  mechanism  for 
activation  of  hormonally  responsive  transcription  that  contributed  to  tumor  growth  and 
survival.  In  order  to  define  the  mechanism  for  regulation  of  this  profile  we  first  sought  to 
determine  whether  low  levels  of  active  ER,  below  the  limit  of  detection  in  clinical  assays, 
might  be  contributing  to  growth  of  ER(-)  class  A  tumors.  In  our  group  A  model  cell  line 
MDA-MB-453,  ER  transcript  levels  were  very  low  with  an  absolute  expression  of  38.0 
and  Affymetrix  MAS  5.0  call  of  Absent.  Incubation  with  lOOnM  E2  had  no  effect  on  cell 
culture  growth  compared  to  vehicle  control.  Accordingly,  incubation  with  the  either  the 
pure  anti-estrogen  ICI  or  tamoxifen,  alone  or  in  combination  with  lOOnM  E2,  had  no 
effect  on  overall  cell  viability  compared  to  vehicle  control  (fig  5A).  This  is  in  contrast  to 
the  ER(+)  cell  line  MCF-7  that  was  markedly  growth  stimulated  by  administration  of 
lOOnM  E2,  and  this  effect  was  abrogated  by  the  addition  of  the  pure  anti-estrogen  ICI 
(data  not  shown).  These  results  suggested  the  ER  was  not  playing  an  active  role  in  the 
ER(-)  class  A  cell  line  growth  and  survival. 
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Because  there  is  the  potential  for  functional  overlap  of  transcriptional  regulation 
by  steroid  hormone  receptors  we  reasoned  that  other  nuclear  receptors  might  play  a  role 
in  ER(-)  class  A  breast  cancers.  We  examined  the  expression  of  many  known  nuclear 
hormone  receptors,  including  ER-|3,  ESRRA,  PR,  AR,  RARA,  RXRA,  GCHR,  PARP, 
and  VDR  and  found  that  the  AR  was  the  only  one  strongly  differentially  over  expressed 
in  ER(-)  class  A.  The  AR  has  been  implicated  in  the  pathogenesis  of  breast  cancer  (16), 
and  it  is  known  to  activate  a  number  of  estrogen  responsive  genes  (34).  Incubation  with 
the  synthetic  non-metabolizable  androgen  R- 1881  at  concentrations  between  0.1  nM  and 
lOnM  stimulated  growth  in  MDA-MB-453.  This  proliferative  effect  was  abrogated  by 
the  addition  of  the  AR  antagonist  flutamide,  confirming  that  the  response  was  AR 
dependent  (fig  5B).  Again  we  determined  that  the  effects  of  androgen  were  not 
dependent  on  the  ER  as  MDA-MB-453  cells  treated  with  androgens  in  combination  with 
the  antiestrogens  tamoxifen  or  ICI  had  minimal  effect  on  the  androgen-induced 
proliferation  (fig  5C).  These  observations  indicate  that  AR  signaling  is  intact  in  ER(-) 
class  A  breast  cancer  cell  lines  and  that  cell  growth  and  survival  are  responsive  to 
androgen  in  an  AR-dependent,  ER-independent  manner. 

The  ER(-)  class  A  molecular  phenotype  is  androgen  dependent 

Because  the  ER(-)  class  A  cell  line  MDA-MB-453  demonstrated  a  proliferative 
response  to  androgen  we  set  out  to  determine  whether  this  was  associated  with  the 
transcriptional  program  characteristic  of  ER(-)  class  A  breast  cancers.  We  monitored 
gene  expression  changes  after  administration  of  androgens,  androgen  antagonists,  or 
vehicle  control  to  the  ER(-)  class  A  cell  line  MDA-MB-453  under  a  variety  of  growth 
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conditions  (see  Methods).  The  results  for  the  various  experiments  were  concordant  but 
the  most  pronounced  differences  in  gene  expression  were  observed  between  those  cells 
first  incubated  in  steroid  deprived  conditions  for  48  hours,  and  then  treated  with  either  R- 
1881  or  vehicle  for  48  hours.  After  trimming  to  eliminate  genes  with  very  low  level 
expression  (<200  in  both  conditions),  497  genes  were  differentially  expressed  by  at  least 
two  fold  between  cells  exposed  to  R-1881  or  vehicle.  The  androgen  regulated  gene 
SARG  was  upregulated  by  247  fold,  and  has  been  previously  shown  to  contain  an 
experimentally  verified,  hormonally  active  androgen  response  element  (35).  Several  other 
androgen  responsive  genes  including  FASN,  NDRG1,  and  SORD  each  contain  putative 
androgen  response  elements  in  their  promoters  (36,  37),  and  were  upregulated  after 
administration  of  R-1881.  These  observations  provided  indirect  evidence  that 
administration  of  R-1881  to  MDA-MB-453  caused  recruitment  of  an  active  AR 
transcription  complex  to  highly  specific  AREs. 

To  evaluate  the  association  between  androgen  responsive  genes  in  MDA-MB-453 
and  the  ER(-)  class  A  molecular  phenotype,  we  compared  androgen  induced  gene 
expression  changes  to  genes  differentially  expressed  between  ER(-)  classes  A  and  B.  Of 
the  497  differentially  expressed  genes  between  cells  treated  with  R-1881  or  vehicle 
control,  22  were  common  to  our  179  gene  ER(-)  class  A  expression  signature,  and  this 
number  of  commonly  expressed  genes  was  higher  than  would  be  expected  by  chance 
alone  (p=3e-8)  (supplemental  table  3).  Therefore  the  genes  that  comprise  the  ER(-)  class 
A  molecular  fingerprint  were  at  least  in  part  androgen  responsive  in  the  class  A  cell  line. 

To  further  explore  the  association  between  the  ER(-)  class  A  molecular  phenotype 
and  an  androgen  dependent  transcription  program,  we  performed  principal  component 
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analysis  of  the  41  ER(-)/PR(-)  breast  tumors  using  the  497  androgen  responsive  genes 
and  plotted  samples  based  on  three  principal  components.  The  ER(-)  class  A  and  B 
samples  formed  distinct  clusters  (fig  6A).  Furthermore,  the  same  approach  using  the  77 
ER(-)/PR(-)  samples  from  the  independent  data  set  demonstrated  clusters  corresponding 
to  our  class  predictions  (fig  6B).  These  results  suggested  that  the  ER(-)  class  A 
molecular  phenotype  was  partially  recapitulated  by  the  expression  of  genes  regulated  by 
androgen  in  ER(-)  class  A  breast  cancer  cells. 

We  also  determined  whether  genes  induced  by  androgens  in  MDA-MB-453 
corresponded  to  the  transcriptional  program  activated  by  estrogens  in  ER(+)  breast  cancer 
cells  and  therefore  could  contribute  to  the  molecular  relationship  between  ER(-)  class  A 
and  ER(+)  breast  cancers.  Fifty  of  the  497  androgen  responsive  genes  from  our 
experiments  were  in  common  with  the  386  estrogen  responsive  genes  determined  by  an 
independent  study  using  MCF7,  T-47D,  and  MDA-MB-436  breast  cancer  cells  (29). 

This  number  of  common  genes  was  much  greater  than  would  be  expected  by  chance 
(p=4e-16)  and  suggested  that  androgen  in  AR  positive  ER(-)/PR(-)  breast  cancer  cells  can 
induce  a  transcriptional  program  that  significantly  overlaps  with  that  induced  by  estrogen 
in  ER(+)  breast  cancer  cells. 

Discussion 

Clinicians  have  long  recognized  that  the  current  classification  of  breast  cancer 
based  on  HER2  status,  histopathological  grade  and  hormone  receptor  status  does  not 
sufficiently  capture  the  clinical  and  biologic  heterogeneity  observed  in  practice.  This  has 
fueled  efforts  to  develop  more  biologically  and  clinically  meaningful  classification  based 
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on  molecular  features.  We  applied  unsupervised  and  supervised  analyses  to  gene 
expression  profiles  of  primary  breast  cancers,  and  identified  a  subset  of  ER(-)/PR(-) 
tumors  with  a  molecular  signature  that  suggests  an  active  hormonally  regulated 
transcriptional  program.  Gene  expression  signatures  were  used  to  develop  a  predictive 
model  that  identified  this  subset  among  novel  tissue  samples  and  breast  cancer  cell  lines. 
The  breast  cancer  cell  line  MDA-MB-453  recapitulated  the  molecular  phenotype  and  was 
used  to  investigate  the  biological  basis  for  this  subclass.  Several  molecules  that  can 
initiate  signal  transduction  contributing  to  tumor  growth  and  survival  were  over 
expressed  in  this  tumor  subset  including  AR,  ERBB2  and  FGFR4.  We  found  that 
androgen  enhanced  growth  of  MDA-MB-453  in  an  ER- independent  but  AR-dependent 
manner.  In  addition,  the  ER(-)  class  A  molecular  phenotype  was  at  least  partially 
androgen  regulated.  Taken  together,  our  findings  help  to  define  a  distinctive  molecular 
subset  of  ER(-)/PR(-)  breast  cancer  with  the  potential  for  novel  targeted  therapeutic 
strategies. 

The  potential  for  molecular  subclassification  of  breast  cancers  based  on  genome 
wide  expression  analysis  has  been  well  documented  in  previous  studies.  Applying  a  class 
discovery  approach  using  cDNA  microarrays,  Perou  et  al  (20)  identified  at  least  5 
molecular  subtypes  of  breast  cancer  (termed  luminal  subtypes  A  and  B,  ERBB2,  basal, 
and  normal  breast  like).  These  subtypes  have  been  repeatedly  observed  in  independent 
data  sets  and  across  various  high  throughput  platforms  (38,  39).  The  luminal  subtype  A 
and  basal  groups  have  been  the  most  robust  in  independent  data  analysis.  This  luminal 
subtype  is  primarily  composed  of  ER(+)  tumors,  generally  demonstrates  a  better 
prognosis  and  is  characterized  by  relative  over  expression  of  estrogen  related  genes  such 
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as  GAT  A3,  XBP1,  F0XA1,  CCDN1,  TFF3  and  ERa.  The  basal  class  is  so  named 
because  its  expression  pattern  resembles  that  of  the  basal  epithelial  cell  component  of  the 
breast  including  lack  of  expression  of  ER  and  related  genes,  and  expression  of 
cytokeratins  5/6  and  17.  Hierarchical  clustering  of  our  data  using  the  intrinsic  gene  list 
revealed  that  the  luminal  A  and  basal  subtypes  were  clearly  evident,  while  the  remaining 
subtypes  were  not  nearly  as  distinct  (supplemental  figure  2).  In  particular  the  ER(-)  class 
A  subtype  we  have  described  tended  to  be  poorly  correlated  with  any  one  of  the  five 
subgroups.  This  is  similar  to  the  findings  of  others  and  suggest  that  other  subtypes  of 
luminal  breast  cancer  require  refinement  of  their  molecular  definition  (38,  39).  The  ER(- 
)  class  A  samples  tend  to  be  distributed  among  the  luminal  A  and  other  non-basal  cases. 
This  subset  is  most  distinct  when  clustering  is  not  limited  to  the  intrinsic  gene  set  and 
even  more  so  when  the  analysis  is  limited  to  ER(-)  breast  cancers.  Our  data  suggests  that 
ER(-)  class  A  breast  cancers  bear  a  much  closer  molecular  relationship  to  luminal  or 
ER(+)  breast  cancers  than  to  the  basal  subtype  despite  the  shared  ER(-)  phenotype.  This 
observation  is  recapitulated  in  the  larger  independent  validation  set  of  77  ER(-)  breast 
cancers.  The  same  observation  in  two  separate  breast  cancer  cohorts  suggests  that  this 
subclass  of  ER(-)/PR(-)  breast  cancer  is  reproducible  and  distinct  with  important 
implications  for  the  diagnosis  and  treatment  of  women  with  ER(-)/PR(-)  breast  cancer. 

Our  studies  also  suggest  that  the  AR  may  play  an  important  role  in  regulating  the 
molecular  events  associated  with  ER(-)  class  A  breast  cancers.  Androgenic  effects  on  the 
proliferation  of  breast  cancer  cell  lines  are  highly  variable  (40),  an  observation  not 
particularly  surprising  considering  the  heterogeneity  of  AR  expression  in  breast  cancer 
and  the  complexity  of  AR  signaling.  While  several  breast  cancer  cell  lines  appear  to  be 
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growth  inhibited  by  the  addition  of  androgens,  a  number  are  growth  stimulated  and  may 
be  androgen  dependent.  5-alpha-dihydrotestosterone  (DHT)  inhibits  the  estrogen  induced 
proliferation  of  MCF-7  breast  cancer  cells  and  induces  a  partial  Gl-S  phase  cell  cycle 
arrest,  accompanied  by  an  increase  in  cdk2-associated  p21  (41).  Alternatively,  Lippman 
et  al  suggests  that  androgens  stimulate  cell  proliferation  and  DNA  synthesis  in  an  AR 
dependent  manner  in  some  cell  lines  (42).  In  agreement  with  our  results,  previous  studies 
have  reported  AR  dependent  androgen  induced  proliferation  in  the  breast  cancer  cell  line 
MDA-MB-453  (40,  43).  Our  data  further  suggest  that  this  proliferative  response  is 
associated  with  a  hormonally  regulated  transcriptional  program  that  is  common  to  ER(-) 
class  A  breast  cancers  and  overlaps  with  ER  induced  transcription  in  ER(+)  tumors. 
However,  the  overlap  is  incomplete,  and  this  may  reflect  the  fact  that  an  integrated 
network  of  signaling  pathways  regulates  cell  proliferation.  We  speculate  that  AR  may  act 
in  concert  with  other  signal  transduction  pathways  to  contribute  to  the  ER(-)  class  A 
molecular  phenotype.  For  example,  it  is  well  known  that  receptor  tyrosine  kinase 
pathways  function  as  modulators  of  nuclear  hormone  receptor  activity  (44)  and  in  this 
regard  it  is  interesting  that  ERBB2  is  differentially  expressed  in  ER(-)  class  A  breast 
cancers.  ERBB2  has  been  shown  to  stabilize  AR  protein  levels  and  optimize  binding  of 
AR  to  promoters  of  androgen  regulated  genes  in  prostate  cancer  cells  (45).  In  the  ER(-) 
class  A  breast  cancer  line  MDA-MB-453,  blocking  ERBB2  with  PKI166  inhibits  PI3K 
signaling,  deactivates  mTOR  and  decreases  cell  proliferation  (46).  Given  the  proliferative 
effect  of  androgen  on  MDA-MB-453  that  we  have  shown,  the  potential  for  cooperative 
crosstalk  between  ERBB2  signaling  and  AR  deserves  further  study.  In  addition,  the 
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antiproliferative  effect  of  antiandrogens  on  MDA-MB-453  provides  the  rational  for  the 
study  of  antiandrogens  to  treat  ER(-)  class  A  breast  cancer. 

It  is  likely  that  some  cases  of  ER(-)  class  A  breast  cancer  are  influenced  by 
active  ERBB2  signaling.  However,  results  of  unsupervised  hierarchical  clustering  of  99 
primary  tumors  revealed  expression  of  ERBB2  among  several  sample  clusters,  and 
suggested  that  the  expression  of  ERBB2  alone  does  not  capture  the  molecular  phenotype 
of  class  A  breast  cancer.  Indeed,  among  the  class  A  samples  in  our  data,  only  30%  were 
ERBB2  positive.  Furthermore,  SKBR-3  cells  have  ERBB2  gene  amplification  and 
protein  over  expression  (33),  and  were  identified  by  our  predictor  as  class  B.  Not 
surprisingly,  the  ERBB2  monoclonal  antibody  trastuzumab  inhibits  the  growth  of  SKBR- 
3  cells  (47,  48).  The  ER(-)  class  A  cell  line  MDA-MB-453  also  overexpresses  ERBB2. 
However,  MDA-MB-453  cells  are  not  ERBB2-amplified  and  are  resistant  to  the 
antiproliferative  effects  of  trastuzumab  (48).  The  expression  of  ERBB2  represents  a 
biologically  and  clinically  important  feature  of  breast  cancer,  and  a  molecular  subtype 
characterized  by  ERBB2  over  expression  has  been  proposed  (20).  Our  observations 
suggest  heterogeneity  within  the  ERBB2  molecular  subtype.  Indeed,  ERBB2  over 
expression  exists  in  estrogen  responsive,  ER(+)  breast  cancer  as  well  as  ER(-)  breast 
cancer.  Further  investigation  into  the  diversity  of  ERBB2  signaling  among  various  breast 
cancer  subtypes  is  required. 

FGFR4  is  another  signaling  molecule  which  may  cooperate  with  AR  and  ERBB2 


to  drive  tumor  growth  in  the  ER(-)  class  A  subtype  of  ER(-)/PR(-)  breast  cancer.  FGFR4 
is  over  expressed  in  ER(-)  class  A  tumors  and  gene  amplification  may  exist  in  as  many  as 
30%  of  all  breast  cancers  (49).  In  MDA-MB-453  cells,  FGFR4  and  ERBB2  have  been 
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shown  to  work  in  concert  to  activate  the  mTOR  translational  pathway  and  regulate  cyclin 
D1  levels  (46).  Simultaneous  inhibition  of  both  pathways  had  a  stronger  antiproliferative 
effect  than  either  alone.  In  addition,  FGFR4  dependent  activation  of  the  MAPK/ERK1/2 
signaling  cascade  can  drive  cell  proliferation  via  downstream  initiation  of  cyclin  D1 
transcription  (50).  This  convergence  of  data  suggests  that  further  investigation  into  the 
role  of  FGFR4,  ERBB2  and  AR  in  ER(-)  class  A  breast  cancers  is  warranted  and  that  this 
molecular  complex  may  provide  useful  therapeutic  targets  for  as  many  as  25%  of  ER(- 
)/PR(-)  breast  cancer  patients. 


Acknowledgements 

We  would  like  to  thank  Dr.  Dennis  Watson  for  generously  providing  SPDEF 
antibody;  Yixin  Wang  and  John  A.  Foekens  for  providing  unpublished  data;  Adam 
Olshen  for  advice  and  manuscript  review;  Louis  Vargus,  Yonghong  Xiao,  and  Lishi  Chen 
for  technical  assistance;  and  Faye  Taylor  for  data  management.  We  are  indebted  to  the 
Pathology  and  Genomics  Core  Facilities  at  MSKCC  for  technical  support.  Supported  by 
DAMD  BC010104  to  WG. 


25 


i 


References 


1.  Jemal  A,  Murray  T,  Ward  E,  et  al.  Cancer  statistics,  2005.  CA  Cancer  J  Clin 
2005;55(l):10-30. 

2.  Gruber  CJ,  Tschugguel  W,  Schneeberger  C,  Huber  JC.  Production  and  actions  of 
estrogens.  N  Engl  J  Med  2002;346(5):340-52. 

3.  McKenna  NJ,  OMalley  BW.  Minireview:  nuclear  receptor  coactivators— an 
update.  Endocrinology  2002;143(7):2461-5. 

4.  Losel  RM,  Falkenstein  E,  Feuring  M,  et  al.  Nongenomic  steroid  action: 
controversies,  questions,  and  answers.  Physiol  Rev  2003;83(3):965-1016. 

5.  Cato  AC,  Nestl  A,  Mink  S.  Rapid  actions  of  steroid  receptors  in  cellular  signaling 
pathways.  Sci  STKE  2002;2002(138):RE9. 

6.  Tobias  JS.  Recent  advances  in  endocrine  therapy  for  postmenopausal  women  with 
early  breast  cancer:  implications  for  treatment  and  prevention.  Ann  Oncol 

2004;  15(12):  1738-47. 

7.  Osborne  CK,  Wakeling  A,  Nicholson  RI.  Fulvestrant:  an  oestrogen  receptor 
antagonist  with  a  novel  mechanism  of  action.  Br  J  Cancer  2004;90  Suppl  l:S2-6. 

8.  Smith  IE,  Dowsett  M.  Aromatase  inhibitors  in  breast  cancer.  N  Engl  J  Med 
2003  ;348(24):243 1  -42. 

9.  Mouridsen  H,  Gershanovich  M,  Sun  Y,  et  al.  Superior  efficacy  of  letrozole  versus 
tamoxifen  as  first-line  therapy  for  postmenopausal  women  with  advanced  breast  cancer: 
results  of  a  phase  HI  study  of  the  International  Letrozole  Breast  Cancer  Group.  J  Clin 
Oncol  2001  ;19(10):2596-606. 

10.  Howell  A,  Cuzick  J,  Baum  M,  et  al.  Results  of  the  AT  AC  (Arimidex,  Tamoxifen, 
Alone  or  in  Combination)  trial  after  completion  of  5  years'  adjuvant  treatment  for  breast 
cancer.  Lancet  2005;365(9453):60-2. 

1 1 .  Goss  PE,  Ingle  JN,  Martino  S,  et  al.  A  randomized  trial  of  letrozole  in 
postmenopausal  women  after  five  years  of  tamoxifen  therapy  for  early-stage  breast 
cancer.  N  Engl  J  Med  2003;349(1 9):  1 793-802. 

12.  Slamon  DJ,  Leyland- Jones  B,  Shak  S,  et  al.  Use  of  chemotherapy  plus  a 
monoclonal  antibody  against  HER2  for  metastatic  breast  cancer  that  overexpresses 
HER2.  N  Engl  J  Med  200 1  ;344(1 1):783-92. 

13.  Lai  P,  Tan  LK,  Chen  B.  Correlation  of  HER-2  status  with  estrogen  and 
progesterone  receptors  and  histologic  features  in  3,655  invasive  breast  carcinomas.  Am  J 
Clin  Pathol  2005;123(4):541-6. 

14.  Isola  JJ.  Immunohistochemical  demonstration  of  androgen  receptor  in  breast 
cancer  and  its  relationship  to  other  prognostic  factors.  J  Pathol  1993;170(l):31-5. 

15.  Agoff  SN,  Swanson  PE,  Linden  H,  Hawes  SE,  Lawton  TJ.  Androgen  receptor 
expression  in  estrogen  receptor-negative  breast  cancer.  Immunohistochemical,  clinical, 
and  prognostic  associations.  Am  J  Clin  Pathol  2003;120(5):725-31. 

16.  Wong  YC,  Xie  B.  The  role  of  androgens  in  mammary  carcinogenesis.  Ital  J  Anat 
Embryol  2001;106(2  Suppl  1):1 11-25. 

17.  DeRisi  J,  Penland  L,  Brown  PO,  et  al.  Use  of  a  cDNA  microarray  to  analyse  gene 
expression  patterns  in  human  cancer.  Nat  Genet  1996;14(4):457-60. 


26 


4 


18.  Golub  TR,  Slonim  DK,  Tamayo  P,  et  al.  Molecular  classification  of  cancer:  class 
discovery  and  class  prediction  by  gene  expression  monitoring.  Science 
1999;286(5439):531-7. 

19.  van 't  Veer  LJ,  Dai  H,  van  de  Vijver  MJ,  et  al.  Gene  expression  profiling  predicts 
clinical  outcome  of  breast  cancer.  Nature  2002;415(6871):530-6. 

20.  Perou  CM,  Sorlie  T,  Eisen  MB,  et  al.  Molecular  portraits  of  human  breast 
tumours.  Nature  2000;406(6797):747-52. 

21.  West  M,  Blanchette  C,  Dressman  H,  et  al.  Predicting  the  clinical  status  of  human 
breast  cancer  by  using  gene  expression  profiles.  Proc  Natl  Acad  Sci  U  S  A 
2001;98(20):1 1462-7. 

22.  Gruvberger  S,  Ringner  M,  Chen  Y,  et  al.  Estrogen  receptor  status  in  breast  cancer 
is  associated  with  remarkably  distinct  gene  expression  patterns.  Cancer  Res 
2001;61(16):5979-84. 

23.  Pusztai  L,  Ayers  M,  Stec  J,  et  al.  Gene  expression  profiles  obtained  from  fine- 
needle  aspirations  of  breast  cancer  reliably  identify  routine  prognostic  markers  and  reveal 
large-scale  molecular  differences  between  estrogen-negative  and  estrogen-positive 
tumors.  Clin  Cancer  Res  2003;9(7):2406-15. 

24.  Nagahata  T,  Onda  M,  Emi  M,  et  al.  Expression  profiling  to  predict  postoperative 
prognosis  for  estrogen  receptor-negative  breast  cancers  by  analysis  of  25,344  genes  on  a 
cDNA  microarray.  Cancer  Sci  2004;95(3):218-25. 

25.  Holzbeierlein  J,  Lai  P,  LaTulippe  E,  et  al.  Gene  expression  analysis  of  human 
prostate  carcinoma  during  hormonal  therapy  identifies  androgen-responsive  genes  and 
mechanisms  of  therapy  resistance.  Am  J  Pathol  2004;  164(1):2 17-27. 

26.  de  Hoon  MJ,  Imoto  S,  Nolan  J,  Miyano  S.  Open  source  clustering  software. 
Bioinformatics  2004;20(9):  1453-4. 

27.  Mosmann  T.  Rapid  colorimetric  assay  for  cellular  growth  and  survival: 
application  to  proliferation  and  cytotoxicity  assays.  J  Immunol  Methods  1983;65(l-2):55- 
63. 

28.  Tang  S,  Han  H,  Bajic  VB.  ERGDB:  Estrogen  Responsive  Genes  Database. 
Nucleic  Acids  Res  2004;32(Database  issue):D533-6. 

29.  Cunliffe  HE,  Ringner  M,  Bilke  S,  et  al.  The  gene  expression  response  of  breast 
cancer  to  growth  regulators:  patterns  and  correlation  with  tumor  expression  profiles. 
Cancer  Res  2003;63(21):7158-66. 

30.  Wang  Y,  Klijn  JG,  Zhang  Y,  et  al.  Gene-expression  profiles  to  predict  distant 
metastasis  of  lymph-node-negative  primary  breast  cancer.  Lancet  2005;365(9460):671-9. 

31.  Hall  RE,  Aspinall  JO,  Horsfall  DJ,  et  al.  Expression  of  the  androgen  receptor  and 
an  androgen-responsive  protein,  apolipoprotein  D,  in  human  breast  cancer.  Br  J  Cancer 
1996;74(8):1 175-80. 

32.  Carsol  JL,  Gingras  S,  Simard  J.  Synergistic  action  of  prolactin  (PRL)  and 
androgen  on  PRL-inducible  protein  gene  expression  in  human  breast  cancer  cells:  a 
unique  model  for  functional  cooperation  between  signal  transducer  and  activator  of 
transcription-5  and  androgen  receptor.  Mol  Endocrinol  2002;  16(7):  1696-7 10. 

33.  Lacroix  M,  Leclercq  G.  Relevance  of  breast  cancer  cell  lines  as  models  for  breast 
tumours:  an  update.  Breast  Cancer  Res  Treat  2004;83(3):249-89. 


27 


34.  Nantermet  PV,  Masarachia  P,  Gentile  MA,  et  al.  Androgenic  induction  of  growth 
and  differentiation  in  the  rodent  uterus  involves  the  modulation  of  estrogen-regulated 
genetic  pathways.  Endocrinology  2005;146(2):564-78. 

35.  Steketee  K,  Ziel-van  der  Made  AC,  van  der  Korput  HA,  Houtsmuller  AB, 
Trapman  J.  A  bioinformatics-based  functional  analysis  shows  that  the  specifically 
androgen-regulated  gene  SARG  contains  an  active  direct  repeat  androgen  response 
element  in  the  first  intron.  J  Mol  Endocrinol  2004;33(2):477-91. 

36.  Dhanasekaran  SM,  Dash  A,  Yu  J,  et  al.  Molecular  profiling  of  human  prostate 
tissues:  insights  into  gene  expression  patterns  of  prostate  development  during  puberty. 
Faseb  J  2005;19(2):243-5. 

37.  Nelson  PS,  Clegg  N,  Arnold  H,  et  al.  The  program  of  androgen-responsive  genes 
in  neoplastic  prostate  epithelium.  ProcNatl  Acad  Sci  U  S  A  2002;99(1 8):  1 1 890-5. 

38.  Sorlie  T,  Tibshirani  R,  Parker  J,  et  al.  Repeated  observation  of  breast  tumor 
subtypes  in  independent  gene  expression  data  sets.  Proc  Natl  Acad  Sci  U  S  A 
2003 ;  1 00( 1 4)  :84 1 8-23 . 

39.  Wilson  CA,  Dering  J.  Recent  translational  research:  microarray  expression 
profiling  of  breast  cancer— beyond  classification  and  prognostic  markers?  Breast  Cancer 
Res  2004;6(5):192-200. 

40.  Birrell  SN,  Bentel  JM,  Hickey  TE,  et  al.  Androgens  induce  divergent  proliferative 
responses  in  human  breast  cancer  cell  lines.  J  Steroid  Biochem  Mol  Biol  1995;52(5):459- 
67. 

41.  Greeve  MA,  Allan  RK,  Harvey  JM,  Bentel  JM.  Inhibition  of  MCF-7  breast  cancer 
cell  proliferation  by  5alpha-dihydrotestosterone;  a  role  for  p21(Cipl/Wafl).  J  Mol 
Endocrinol  2004;32(3):793-8 10. 

42.  Lippman  M,  Bolan  G,  Huff  K.  The  effects  of  androgens  and  antiandrogens  on 
hormone-responsive  human  breast  cancer  in  long-term  tissue  culture.  Cancer  Res 
1976;36(12):4610-8. 

43.  Hall  RE,  Birrell  SN,  Tilley  WD,  Sutherland  RL.  MDA-MB-453,  an  androgen- 
responsive  human  breast  carcinoma  cell  line  with  high  level  androgen  receptor 
expression.  Eur  J  Cancer  1994;30A(4):484-90. 

44.  Shao  D,  Lazar  MA.  Modulating  nuclear  receptor  function:  may  the  phos  be  with 
you.  J  Clin  Invest  1999;103(12):  1617-8. 

45.  Mellinghoff  IK,  Vivanco  I,  Kwon  A,  Tran  C,  Wongvipat  J,  Sawyers  CL. 
HER2/neu  kinase-dependent  modulation  of  androgen  receptor  function  through  effects  on 
DNA  binding  and  stability.  Cancer  Cell  2004;6(5):5 17-27. 

46.  Koziczak  M,  Hynes  NE.  Cooperation  between  fibroblast  growth  factor  receptor-4 
and  ErbB2  in  regulation  of  cyclin  D1  translation.  J  Biol  Chem  2004;279(48):50004-l  1. 

47.  Mayfield  S,  Vaughn  JP,  Kute  TE.  DNA  strand  breaks  and  cell  cycle  perturbation 
in  herceptin  treated  breast  cancer  cell  lines.  Breast  Cancer  Res  Treat  2001  ;70(2):  123-9. 

48.  Yakes  FM,  Chinratanalab  W,  Ritter  CA,  King  W,  Seelig  S,  Arteaga  CL. 
Herceptin-induced  inhibition  of  phosphatidylinositol-3  kinase  and  Akt  Is  required  for 
antibody-mediated  effects  on  p27,  cyclin  Dl,  and  antitumor  action.  Cancer  Res 
2002;62(  1 4):4 132-4 1 . 

49.  Dickson  C,  Spencer-Dene  B,  Dillon  C,  Fantl  V.  Tyrosine  kinase  signalling  in 
breast  cancer:  fibroblast  growth  factors  and  their  receptors.  Breast  Cancer  Res 
2000;2(3):191-6. 


28 


50.  Koziczak  M,  Holbro  T,  Hynes  NE.  Blocking  of  FGFR  signaling  inhibits  breast 
cancer  cell  proliferation  through  downregulation  of  D-type  cyclins.  Oncogene 
2004;23(20):3501-8. 


29 


Table  1 


mi 


mi 


■m 


mi 


mi 


Gene  Name 


214451  at 


209173  at 


217276  x  at 


216623  x  at 


217284  x  at 


206509  at 


214774  x  at 


214243  s  at 


206463  s  at 


209309  at 


215108  x  at 


201525  at 


213441  x  at 


204667  at 


217562  at 


209813  x  at 


value 


5.04E-11 


7.88E-07 


2.36E-08 


1.14E-05 


1.81E-09 


2.56E-07 


1.02E-08 


2.29E-06 


5.46E-08 


6.88E-06 


1.37E-06 


2.05E-10 


1.78E-05 


1.70E-13 


1.58E-05 


7.75E-05 


1.80E-07 


7.43E-08 


7.99E-11 


1.80E-12 


5.78E-05 


5.71E-05 


Class  A  v  B 
Fold 

Change 

Common 

Name 

Description 

44.66 

TFAP2B 

transcription  factor  AP-2 
beta  (activating  enhancer 

25.86 

AGR2 

anterior  gradient  2 
homoloq  (Xenopus 

22.61 

dJ222E13.1 

kraken-like 

20.46 

TNRC9 

trinucleotide  repeat 
containing  9 

20.06 

HMGCS2 

3-hydroxy-3- 

methylglutaryl-Coenzyme 

19.42 

CU222E13.1 

kraken-like 

17.82 

PIP 

prolactin-induced  protein 

16.7 

TNRC9 

trinucleotide  repeat 
containing  9 

15.55 

dJ222E13.1 

kraken-like 

14.47 

CRISP3 

cysteine-rich  secretory 
protein  3 

14.41 

DHRS2 

dehydrogenase/reductas 
e  (SDR  family)  member  2 

13.99 

SPDEF 

SAM  pointed  domain 
containing  ets 

13.8 

AZGP1 

alpha-2-glycoprotein  1, 
zinc 

12.31 

SPDEF 

SAM  pointed  domain 
containing  ets 

11.79 

TNRC9 

trinucleotide  repeat 
containing  9 

11.33 

CLCA2 

chloride  channel,  calcium 
activated,  family  member 

11.05 

APOD 

apolipoprotein  D 

9.676 

LASS4 

LAG1  longevity 
assurance  homoloq  4  (S. 

9.494 

SPDEF 

SAM  pointed  domain 
containing  ets 

7.782 

FOXA1 

forkhead  box  A1 

7.704 

DBCCR1L 

DBCCRl-like 

7.691 

TRGV9;  V2;  T 

T-cell  receptor  (V-J-C) 
precursor;  Human  T-cell 

Table  1 


214079  at 


210576  at 


205221  at 


213884  s  at 


211657  at 


204623  at 


39763  at 


204719  at 


220622  at 


211110 

s  at 

218313 

s  at 

204942 

s  at 

203722 

at 

210056 

at 

219734 

at 

221584 

_s  .at 

219197 

s  at 

204462 

s  at 

215465  at 


4.21E-05 


6.64E-05 


1.64E-06 


1.04E-09 


2.95E-05 


9.51E-07 


2.61E-07 


5.11E-12 


2.12E-10 


7.17E-05 


1.24E-05 


3.00E-05 


1.46E-05 


2.45E-06 


1.31E-07 


1.06E-05 


9.47E-09 


5.96E-10 


7.57E-06 


2.71E-05 


4.10E-07 


2.10E-05 


7.428  DHRS2 


7.337  CYP4F8 


6.974  HGD 


6.785  TRIM 3 


6.542 


6.317  FGFR4 


6.007  TFF3 


5.895  MLPH 


5.599  HPX 


5.233  ABCA8 


5.222  FU23259 


5.165  HPX 


AR 


4.85  GALNT7 


4.721  ALDH3B2 


4.681  ALDH4A1 


4.538  RND1 


4.512  FU20174 


4.344  KCNMA1 


4.218  SCUBE2 


4.158  SLC16A2 


4.091  TRG@ 


4.083  PEX11A 


4.009  ABCA12 


dehydrogenase/reductas 
e  (SDR  family)  member  2 


cytochrome  P450,  family 
4,  subfamily  F, 


homogentisate  1,2- 
dioxyqenase 


tripartite  motif- 
containinq  3 


fibroblast  growth  factor 
receptor  4 


trefoil  factor  3 
(intestinal 


melanophilin 


precursor;  Human 
hemopexin  qene,  exon 


hypothetical  protein 
FU23259 


hemopexin 


androgen  receptor 
d  i  hyd  rotestosterone 


UDP-N-acetyl-alpha-D- 

~alactosamine:polypeptid 


aldehyde  dehydrogenase 
3  family,  member  B2 


aldehyde  dehydrogenase 
4  family,  member  A1 


Rho  family  GTPase  1 


hypothetical  protein 
FLJ20174 


potassium  large 
conductance  calcium- 


signal  peptide,  CUB 
domain,  EGF-like  2 


solute  carrier  family  16 
monocarboxylic  acid 


T  cell  receptor  gamma 
locus 


peroxisomal  biogenesis 
factor  11A 


Table  1 


217014  s  at 


215806  x  at 


201952  at 


212218  s  at 


205306  x  at 


207843  x  at 


215726  s  at 


209366  x  at 


218546  at 


5.00E-05 


1.22E-08 


3.35E-06 


1.11E-05 


5.13E-05 


8.81E-08 


1.73E-05 


4.12E-07 


2.63E-05 


1.43E-06 


8.16E-06 


1.17E-06 


5.72E-07 


4.43E-05 


2.19E-05 


8.33E-05 


3.925 


3.84  TRG@ 


3.671  XBP1 


3.568  ALCAM 


3.562  FASN 


3.491 


3.479  KIAA0089 


3.441  KMO 


3.426  CYB5 


3.305  TNXB 


3.224  CYB5 


3.204  KIAA0644 


3.185  CYB5 


3.144  GALNT6 


3.124  FA2H 


3.095  FU14146 


3.082  CRAT 


Homo  sapiens  PAC  clone 
RP4-604G5  from  7, 


T  cell  receptor  gamma 
locus 


activated  leukocyte  cell 
adhesion  molecule 


fatty  acid  synthase 


MRNA,  chromosome  1 
specific  transcript 


KIAA0089  protein 


kynurenine  3- 
monooxyqenase 


cytochrome  b-5 


tenascin  XB 


cytochrome  b-5 


KIAA0644  qene  product 


cytochrome  b-5 


UDP-N-acetyl-alpha-D- 

alactosamine:polypeptid 


fatty  acid  2-hydroxylase 


hypothetical  protein 
FU14146 


carnitine 
acetvltransferase 


208284  x  at 

1.74E-06 

3.022 

GGT1 

gamma- 

qlutamyltransferase  1 

215559  at 

6.47E-05 

3.013 

ABCC6 

ATP-binding  cassette, 
sub-family  C 

218776  s  at 

4.25E-08 

2.859 

FLJ23375 

hypothetical  protein 
FU23375 

204579  at 

3.21E-06 

n 

FGFR4 

fibroblast  growth  factor 
receptor  4 

207131  x  at 

6.49E-07 

2.764 

GGT1 

gamma- 

qlutamyltransferase  1 

206850  at 


2.84E-06 


2.54E-05 


2.74  DCXR 


2.71  RRP22 


dicarbonyl/L-xylulose 

reductase 


RAS-related  on 
chromosome  22 


Table  1 


— 

1 

1 

1 

212593  s  at 

1.01E-06 

2.706 

PDCD4 

programmed  cell  death  4 
(neoplastic 

’Tjjsjje! 

2.72E-05 

2.683 

PEX11A 

peroxisomal  biogenesis 
factor  11A 

3 

203740  at 

3.34E-07 

2.676 

MPHOSPH6 

M-phase  phosphoprotein 

6 

1 

211417  x  at 

1.79E-07 

2.591 

GGT1 

gamma- 

qlutamyltransferase  l 

1 

1 

209919  x  at 

5.16E-07 

2.552 

GGT1 

gamma- 

qlutamyltransferase  1 

2.70E-05 

2.536 

PRLR 

prolactin  receptor 

213557  at 

4.36E-05 

2.492 

Transcribed  sequences 

1 

1.10E-06 

2.436 

PDCD4 

programmed  cell  death  4 
(neoplastic 

— 

... 

201941  at 

5.57E-06 

2.372 

CPD 

carboxypeptidase  D 

1 

212736  at 

8.29E-05 

2.333 

BC008967 

hypothetical  gene 
BC008967 

218552  at 

9.75E-06 

2.306 

FLJ 10948 

hypothetical  protein 

FU 10948 

1 

9.72E-07 

2.24 

Clone  IMAGE:4816940, 
mRNA 

% 

212099  at 

7.79E-05 

2.189 

ARHB 

ras  homolog  gene  family, 
member  B 

1 

2.67E-06 

2.187 

CIRBP 

HKISH 

1 

212956  at 

4.61E-06 

2.166 

qp61gl2.xl 

NCI  CGAP  Co8  Homo 

1 

200618  at 

8.97E-05 

2.151 

LASPI 

LIM  and  SH3  protein  1 

1 

211596  s  at 

7.34E-05 

2.146 

LRIG1 

leucine-rich  repeats  and 
immunoqlobulin-like 

1 

213107  at 

8.17E-05 

2.139 

yh03el2.sl  Soares  infant 
brain  1NIB  Homo  sapiens 

■45 

208872  s  at 

6.53E-06 

2.131 

DPI 

polyposis  locus  protein  1 

l 

215603  x  at 

2.122 

GGT2 

gamma- 

glutamyltransferase  2 

I 

4.27E-05 

2.116 

NEIL1 

nei  endonuclease  VIII- 
like  1  (E.  coli) 

1 

i  ;; 

7Fj 

211621  at 

9.37E-06 

2.105 

AR 

androgen  receptor 
( d  i  hyd  rotestoste  ro  n  e 

1 

2.29E-05 

2.019 

Human  phenol 
sulfotransferase  (STP1) 

1 

219543  at 

8.17E-05 

2.014 

MAWBP 

M IRRIRIHIH 

Table  1 


37966  at 


200756  x  at 


203167  at 


219785  s  at 


212650  at 


200757  s  at 


211924  s  at 


205120  s  at 


209043  at 


200755  s  at 


209204  at 


213003  s  at 


200934  at 


221505  at 


210074  at 


214845  s  at 


60474  at 


202620  s  at 


202236  s  at 


219944  at 


4.40E-05 


4.37E-05 


7.48E-05 


9.20E-05 


9.18E-05 


2.99E-06 


8.44E-06 


1.03E-05 


1.61E-05 


4.29E-05 


3.07E-05 


2.85E-05 


9.43E-05 


2.27E-05 


6.04E-05 


8.10E-06 


5.18E-06 


1.96E-05 


8.81E-05 


3.64E-05 


2.99E-06 


1.86E-05 


0.494  PARVB 


0.489  CALU 


0.487  TIMP2 


polyposis  locus  protein  1 


arvin,  beta 


calumenin 


tissue  inhibitor  of 
metalloproteinase  2 


0.485  MGC15419  MGC15419  protein 


0.478  NACSIN 


0.477  CALU 


0.463  PLAUR 


0.458  SGCB 


0.457  PAPSS1 


0.443  CALU 


0.429  LM04 


0.426 


0.411  PYGL 


0.4041  DEK 


0.378  ANP32E 


0.377  CTSL2 


0.37  PLOD2 


0.363  SLC16A1 


NPF/calponin-like  protein 


calumenin 


plasminogen  activator, 
urokinase  receptor 


sarcoglycan,  beta  (43kDa 
dystrophin-associated 


3'-phosphoadenosine  5'- 
hosphosulfate  synthase 


smoothened  homolog 
Drosophila 


calumenin 


LIM  domain  only  4 


7i79f07.xl 

NCI  CGAP  Ovl8  Homo 


phosphorylase,  glycogen; 
liver  (Hers  disease. 


FLJ21069 


acidic  (leucine-rich) 
nuclear  phosphoprotein 


cathepsin  L2 


calumenin 


chromosome  20  open 
readina  frame  42 


procollagen-lysine,  2- 
oxoalutarate  5- 


solute  carrier  family  16 
monocarboxylic  acid 


hypothetical  protein 
FLJ21069 


Table  1 


202134  s  at 


216488  s  at 


202619  s  at 


218851  s  at 


207675  x  at 


201564  s  at 


202784  s  at 


209834  at 


213260  at 


208103  s  at 


204285  s  at 


209875  s  at 


202235  at 


204750  s  at 


204286  s  at 


209800  at 


3.20E-05 

0.332 

TAZ 

2.33E-06 

0.326 

ATP11A 

6.84E-06 

0.318 

PLOD2 

8.81E-05 

0.315 

WDR33 

5.89E-05 

0.313 

ARTN 

1.53E-05 

0.31 

MGC48332 

3.22E-05 

0.307 

FSCN1 

9.90E-05 

0.306 

POPDC3 

3.61E-05 

0.293 

NNT 

5.16E-05 

0.283 

SLC16A1 

2.96E-05 

0.249 

CHST3 

4.56E-05 

0.228 

FOXC1 

6.99E-05 

0.227 

ANP32E 

9.09E-09 

0.222 

PMAIP1 

7.49E-06 

0.217 

SPP1 

1.95E-07 

0.196 

SLC16A1 

1.34E-05 

0.144 

D5C2 

9.42E-07 

0.141 

PMAIP1 

1.03E-05 

0.121 

KRT16 

2.69E-05 

0.0816 

SERPINB5 

transcriptional  co¬ 
activator  with  PDZ- 


ATPase,  Class  VI,  type 
11A 


procollagen-lysine,  2- 
oxoqlutarate  5- 


WD  repeat  domain  33 


artemin 


hypothetical  protein 
MGC48332 


popeye  domain 
containing  3 


nicotinamide  nucleotide 
transhydroqenase 


solute  carrier  family  16 
monocarboxylic  acid 


carbohydrate 
chondroitin  6 


forkhead  box  Cl 


acidic  (leucine-rich) 
nuclear  phosphoprotein 


phorbol-12-myristate-13- 
acetate-induced  protein  1 


secreted  phosphoprotein 
1  (osteopontin,  bone 


solute  carrier  family  16 
monocarboxylic  acid 


desmocollin  2 


phorbol-12-myristate-13- 
acetate-induced  protein  1 


keratin  16  (focal  non- 
epidermolytic 


serine  (or  cysteine) 
proteinase  inhibitor,  clade 


Table  1 


j' 

r 

\  •  . 

1 

t; 

:/ 

j; 

.  • 

r 
!  • 

Up  in  ER(+) 

•sTwVa&l^: 

■3«('  ':-\sV 
■ .«)« tj  i  a)  it  si r f<e  ;ajj  t. 

3  fid 

iais 

.V  ■ 

Up  in  ER(-) 

>3  fold 

p< 0.0001 

Figure  Legends 


Figure  1.  Molecular  heterogeneity  of  breast  cancers.  Two  way  hierarchical  clustering 
was  performed  with  99  primary  breast  cancers  based  on  1960  genes  with  the  greatest 
variance  among  samples.  The  dendogram  represents  the  relationship  of  samples.  The 
length  of  the  branches  represents  1-  the  correlation  coefficient  between  samples.  A 
strongly  differentially  expressed  gene  cluster  is  enlarged  and  genes  associated  with 
estrogen  receptor  status  are  labeled.  Samples  are  arranged  in  columns  and  genes  in  rows. 
Expression  levels  are  pseudocolored  red  to  indicate  transcript  levels  above  the  median  for 
that  gene  across  all  samples  and  green  below  the  median.  Color  saturation  is  proportional 
to  the  magnitude  of  expression. 

Figure  2.  Molecular  subclasses  of  ER(-)/PR(-)  breast  cancers.  A.  Two  way  hierarchical 
clustering  was  performed  with  41  ER(-)/PR(-)  breast  cancers  based  on  1366  genes  with 
greatest  variance  among  samples.  Samples  with  a  molecular  similarity  to  ER(+)  breast 
cancers  are  labeled  class  A  and  the  remaining  as  class  B.  A  gene  cluster  highly 
differentially  expressed  between  the  two  classes  is  enlarged  and  select  characterized  genes 
labeled.  B.  Three  dimensional  plot  of  ER(-)/PR(-)  primary  breast  cancers  based  on  the 
three  principal  components  representing  the  greatest  variance  in  gene  expression  across 
the  41  ER(-)/PR(-)  samples  identified  by  analysis  of  all  22,283  U133A  probe  sets. 

Figure  3.  Immunohistochemical  evaluation  of  differentially  expressed  genes. 

Representative  photo  micrographs  of  immunohistochemistry  studies  for  ALCAM  in  an 
ER(-)  class  A  breast  tumor  (A)  and  an  ER(-)  class  B  breast  tumor  (B).  SPDEF  in  a  class  A 
breast  tumor  (C)  and  class  B  breast  tumor  (D).  AR  in  a  class  A  breast  tumor  (E)  and  a  class 
B  breast  tumor  (F). 

Figure  4.  Reproducibility  of  ER(-)  breast  cancer  subclasses.  Two  way  hierarchical 
clustering  was  performed  with  77  ER(-)  breast  tumors  from  an  independent  data  set  using 
1262  genes  with  greatest  variance  across  samples.  The  resulting  dendrogram  revealed  a 
tendency  to  group  samples  according  to  our  class  prediction  assignments.  A  strongly 
differentially  expressed  gene  cluster  is  enlarged  and  genes  associated  with  ER  status  and 
class  A  are  labeled. 

Figure  5.  ER(-)  class  Abreast  cancer  cells  proliferate  in  response  to  androgen  in  an  AR 
dependent  and  ER  independent  manner.  MDA-MB-453  cells  were  treated  with  reagents 
as  indicated  and  cell  proliferation  measured.  All  experiments  were  performed  in  triplicate.  A. 
Incubation  with  E2,  the  antiestrogens  tarn  and  ICI  with  or  without  E2,  and  vehicle  control.  B. 
Incubation  with  the  androgen  R-1881,  R-1881  with  the  AR  antagonist  flutamide,  flutamide 
alone,  and  vehicle  control.  C.  Incubation  with  R-1881,  R-1881  with  tarn,  R-1881  with  ICI,  and 
vehicle  control. 


Figure  6.  Molecular  subclasses  of  ER(-)  breast  cancer  based  on  androgen  responsive 
genes.  Three  dimensional  plot  of  the  three  principal  components  with  the  greatest 
variance  across  41  ER(-)/PR(-)  primary  breast  tumors  using  497  genes  responsive  to 
androgen  in  the  class  A  cell  line  MDA-MB-453.  B.  Three  dimensional  plot  of  the  three 
principal  components  with  the  greatest  variance  among  77  ER(-)  breast  tumors  from  an 
independent  data  set  using  the  497  androgen  responsive  genes.  Samples  are  colored 
according  to  class  prediction  assignments. 

Supplementary  Figure  1.  Molecular  subclasses  of  breast  cancer  based  on  genes 
responsive  to  estrogen.  A  two  way  hierarchical  clustering  dendrogram  of  99  primary  breast 
cancers  was  performed  using  387  estrogen  responsive  genes  described  in  reference  29. 
Expression  levels  of  the  estrogen  receptor  RNA  and  selected  genes  responsive  to  estrogen 
and  over  expressed  in  class  A  are  depicted. 

Supplementary  Figure  2.  Two  way  hierarchical  clustering  of  99  primary  breast  cancers 
limited  to  genes  of  the  intrinsic  gene  list  of  reference  20.  Representative  clusters  of  genes 
corresponding  to  subtypes  described  are:  Luminal  A  gene  cluster  (1),  ERBB2  gene  cluster 
(2),  Normal  Breast  Like  gene  cluster  (3),  Basal  gene  cluster  (4),  and  Luminal  B  gene 
cluster  (5). 
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By  means  of  in  vivo  selection,  transcriptomic  analysis,  functional  verification  and 
clinical  validation,  we  identified  a  set  of  genes  that  marks  and  mediates  breast 
cancer  metastasis  to  lung.  Some  of  these  genes  serve  dual  functions,  providing 
growth  advantages  both  in  the  primary  tumor  and  in  the  lung  microenvironment. 
Others  contribute  to  aggressive  growth  selectively  in  the  lung.  Many  encode 
extracellular  proteins  and  are  of  previously  unknown  relevance  to  cancer 
metastasis. 

Metastasis  is  frequently  a  final  and  fatal  step  in  the  progression  of  solid  malignancies. 
Tumor  cell  intravasation,  survival  in  circulation,  extravasation  into  a  distant  organ, 
angiogenesis,  and  uninhibited  growth  constitute  the  metastatic  process1.  The  molecular 
requirements  for  some  of  these  steps  may  be  tissue-specific.  Indeed,  the  proclivity  that 
tumors  have  for  specific  organs,  such  as  breast  carcinomas  for  bone  and  lung,  was 
noted  over  a  century  ago2. 

The  identity  and  time  of  onset  of  the  changes  that  endow  tumor  cells  with  these 
metastatic  functions  are  largely  unknown  and  the  subject  of  debate.  It  is  believed  that 
genomic  instability  generates  large-scale  cellular  heterogeneity  within  tumor 
populations,  from  which  rare  cellular  variants  with  augmented  metastatic  abilities  evolve 
through  a  Darwinian  selection  process2,3.  Work  on  experimental  metastasis  using  tumor 
cell  lines  has  demonstrated  that  re-injection  of  metastatic  cell  populations  can  enrich  for 
the  metastatic  phenotype4'6.  Recently,  however,  the  existence  of  genes  expressed  by 
rare  cellular  variants  that  specifically  mediate  metastasis  has  been  challenged7. 
Transcriptomic  profiling  of  primary  human  carcinomas  have  identified  gene  expression 
patterns  that,  when  present  in  the  bulk  primary  tumor  population,  predict  poor  patient 
prognosis 8'10.  The  existence  of  such  signatures  has  been  interpreted  to  mean  that 
genetic  lesions  acquired  early  in  tumorigenesis  are  sufficient  for  the  metastatic  process, 
and  that  consequently  no  metastasis-specific  genes  exist7.  However,  it  is  unclear 
whether  these  genes  that  predict  metastatic  recurrence  are  also  functional  mediators. 
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The  lungs  and  bones  are  frequent  sites  of  breast  cancer  metastasis,  and  metastases  to 
these  sites  differ  in  terms  of  their  evolution,  treatment,  morbidity  and  mortality11. 
Reasoning  that  each  organ  places  different  demands  on  circulating  cancer  cells  for  the 
establishment  of  metastases,  we  sought  to  identify  genes  expressed  in  breast  cancer 
cells  that  selectively  mediate  lung  metastasis  and  that  correlate  with  the  propensity  of 
primary  human  breast  cancers  to  relapse  to  the  lung. 

Selection  of  cells  metastatic  to  lung 

The  cell  line  MDA-MB-231  was  derived  from  the  pleural  effusion  of  a  breast  cancer 
patient  suffering  from  widespread  metastasis  years  after  removal  of  her  primary  tumor12. 
Individual  MDA-MB-231  cells  grown  and  tested  as  single  cell-derived  progenies  (SCPs) 
exhibit  distinct  metastatic  ability  and  tissue  tropism13  despite  having  similar  expression 
levels  of  genes  constituting  a  validated  Rosetta-type  poor  prognosis  signature9 
(Supplementary  Figure  SI).  These  different  metastatic  behaviors,  including  different 
tropisms  to  bone  and  lung,  are  associated  with  discrete  variation  in  overall  gene 
expression  patterns  (Supplementary  Figure  SI;  ref. 13).  Thus,  we  hypothesized  that 
organ-specific  metastasis  must  be  determined  by  genes  that  are  distinct  from  a  Rosetta- 
type  poor  prognosis  signature  and  are  differentially  expressed  within  the  MDA-MB-231 
population.  Indeed,  previous  work  has  demonstrated  this  to  be  the  case  for  most  of  the 
genes  linked  to  the  activity  of  bone  metastatic  subpopulations4’13. 

To  identify  genes  that  mediate  lung  metastasis  we  tested  parental  MDA-MB-231  cells 
and  the  1834  sub-line  (an  in  vivo  isolate  with  no  enhancement  in  bone  metastatic 
behavior4)  (Figure  la)  by  tail  vein  injection  into  immunodeficient  mice  (Figure  1b). 
Metastatic  activity  was  assayed  using  bioluminescence  imaging  of  luciferase- 
transduced  cells  as  well  as  gross  examination  of  the  lungs  at  necropsy.  The  1834  cells 
exhibited  limited  but  significant  lung  metastatic  activity  compared  to  the  parental 
population  (Figure  1b).  When  1834-derived  lung  lesions  were  expanded  in  culture  and 
re-inoculated  into  mice,  these  cells  (denoted  as  LM1  subpopulations;  Figure  la) 


4 


\ 


& 


exhibited  increased  lung  metastatic  activity.  Another  round  of  in  vivo  selection  yielded 
second-generation  populations  (denoted  LM2)  that  were  rapidly  and  efficiently 
metastatic  to  lung  (Figure  1b).  Histological  analysis  confirmed  that  LM2  lesions 
replaced  large  areas  of  the  lung  parenchyma,  whereas  1834  cells  exhibited 
intravascular  growth  with  less  extensive  extravasation  and  parenchymal  involvement 
(Figure  1c).  Inoculation  of  as  few  as  2x1 03  LM2  cells  was  sufficient  for  the  emergence 
of  aggressive  lung  metastases  whereas  inoculation  of  2x1 05  parental  cells  left  only  a 
residual,  indolent  population  in  the  lungs  (Figure  Id).  Furthermore,  the  enhancement  in 
lung  metastatic  activity  was  tissue-specific.  When  LM2  populations  were  inoculated  into 
the  left  cardiac  ventricle  to  facilitate  bone  metastasis,  their  metastatic  activity  was 
comparable  to  that  of  the  parental  and  1834  populations,  and  it  was  markedly  inferior  to 
that  of  a  previously  described,  highly  aggressive  bone  metastatic  population  (Figure  1b). 

Elucidation  of  a  lung  metastasis  signature 

To  identify  patterns  of  gene  expression  associated  with  aggressive  lung  metastatic 
behavior,  we  performed  transcriptomic  microarray  analysis  of  the  highly  and  weakly 
lung  metastatic  cell  populations.  The  gene  list  obtained  from  a  class  comparison 
between  parental  and  LM2  populations  was  filtered  to  exclude  genes  that  were 
expressed  at  low  levels  in  a  majority  of  samples  and  to  ensure  a  3-fold  or  higher  change 
in  expression  level  between  the  two  groups.  A  total  of  95  unique  genes  (113  probe  sets) 
met  these  criteria  with  48  overexpressed  and  47  underexpressed  in  cell  populations 
most  metastatic  to  the  lung  (Figure  2a;  Supplementary  Table  2).  This  gene  set  was 
largely  distinct  from  the  bone  metastasis  gene-expression  signature  previously  identified 
in  bone  metastatic  isolates  derived  from  the  same  parental  cell  line4.  In  fact,  only  6 
genes  overlapped  with  concordant  expression  patterns  between  the  two  groups 
(Supplementary  Table  3). 

Hierarchical  clustering  with  the  95-gene  list  confirmed  a  robust  relationship  between  this 
gene  expression  signature  and  the  lung-specific  metastatic  activity  of  in  vivo-selected 
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cell  populations  (Figure  2a).  In  addition,  this  gene  expression  signature  segregated  the 
SCPs  (which  were  not  used  in  generation  of  the  gene  list)  into  two  major  groups,  one 
transcriptomically  resembling  the  parental  cells,  and  the  other  more  similar  to  the  in 
vivo-selected  lung  metastatic  populations.  This  latter  group  of  SCPs  was  also  more 
metastatic  to  lung  than  the  former  group  (Figure  2b).  However,  unlike  the  LM2 
populations,  none  of  the  lung  metastatic  SCPs  concordantly  expressed  all  of  the  genes 
in  the  lung  metastasis  signature  (Figure  2a).  Consistent  with  this  observation,  the  lung 
metastatic  activity  of  the  LM2  populations  was  approximately  one  order  of  magnitude 
greater  than  the  most  aggressive  SCPs  (Figure  2b).  We  postulated  that  the  subset  of 
genes  from  the  95-gene  signature  that  are  uniformly  expressed  by  all  lung  metastatic 
SCPs  and  in  vivo-selected  populations  may  confer  baseline  lung  metastatic  functions, 
which  we  define  as  lung  metastagenicity.  Genes  that  are  expressed  exclusively  in  the 
most  aggressive  LM2  populations  may  serve  specialized,  lung-restricted  functions, 
which  we  collectively  denote  as  lung  metastatic  virulence.  A  final  list  of  54  candidate 
lung  metastagenicity  and  virulence  genes  was  selected  for  further  evaluation 
(Supplementary  Methods  and  Supplementary  Table  4). 

Genes  that  mediate  lung  metastasis 

A  subset  of  biologically  intriguing  genes  overexpressed  in  the  54  gene  list  was  selected 
for  functional  validation.  These  genes  include  the  EGF  family  member  epiregulin 
(EREG),  which  is  a  broad-specificity  ligand  for  the  HER/ErbB  family  of  receptors14,15,  the 
chemokine  GR01/CXCL1 16,  the  matrix  metalloproteinases  MMP1  (collagenase  I)17  and 
MMP2  (gelatinase  A)18,  the  cell  adhesion  molecule  SPARC19,  the  interleukin-13  decoy 
receptor  IL13Ra220  and  the  cell  adhesion  receptor  VCAM121,22  (Figure  2a).  These 
genes  encode  secretory  or  receptor  proteins,  suggesting  roles  in  the  tumor  cell 
microenvironment.  In  addition  to  these  genes,  we  also  included  the  transcriptional 
inhibitor  of  cell  differentiation  and  senescence  /Df23,24  and  the  prostaglandin- 
endoperoxide  synthase  PTGS2/COX225.  Northern  blot  analysis  of  the  various  in  vivo- 
selected  cell  populations  revealed  expression  patterns  for  these  genes  that  correlated 
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with  metastatic  behavior  (Figure  2c).  SPARC,  IL13Ra2,  VCAM1  and  MMP2  belong  to 
the  subset  of  genes  whose  expression  is  generally  restricted  to  aggressive  lung 
metastatic  populations  and  are  rarely  expressed  (less  than  10%  prevalence  for  VCAM1 
and  IL13Ra2,  and  less  than  2%  prevalence  for  SPARC  and  MMP2)  among  randomly 
picked  SCPs  (data  not  shown).  In  contrast,  the  expression  of  ID1,  CXCL1,  COX2, 

EREG ,  and  MMP1  is  not  restricted  to  aggressive  lung  metastasis  populations  but 
increases  with  lung  metastatic  ability.  Analysis  of  protein  expression  for  these  genes 
confirmed  that  the  differences  in  mRNA  levels  translated  into  significant  alterations  in 
protein  levels  (Supplementary  Figure  S2). 

To  determine  if  these  genes  play  a  causal  role  in  lung  metastasis,  they  were 
overexpressed  via  retroviral  infection  in  the  parental  population  either  individually,  in 
groups  of  three,  or  in  groups  of  six  (Supplementary  Figure  S3).  Only  cells 
overexpressing  ID1  alonewere  modestly  more  active  at  forming  lung  metastases  when 
compared  to  cells  infected  with  vector  controls  (Figure  3a).  Consistent  with  the 
hypothesis  that  metastasis  requires  the  concerted  action  of  multiple  effectors, 
combinations  of  these  genes  invariably  led  to  more  aggressive  metastatic  activity  and 
some  combinations  recapitulated  the  aggressiveness  of  the  4175  LM2  population 
(Figure  3b).  Triple  combinations  of  lung  metastasis  genes  in  parental  cells  did  not 
enhance  bone  metastatic  activity  (Supplementary  Figure  S4),  supporting  their  identity 
as  tissue-specific  mediators  of  metastasis.  The  necessity  of  some  of  these  genes  was 
tested  by  stably  decreasing  their  expression  in  4175  (LM2)  cells  with  short-hairpin  RNAi 
vectors  (Figure  3c).  Reduction  of  ID1,  VCAM1,  or  IL13Ra2  levels  decreased  the  lung 
metastatic  activity  of  4175  cells  by  more  than  10-fold  (Figure  3d).  These  effects  are  not 
due  to  activation  of  the  RNAi  machinery,  because  efficient  knock  down  of  another  gene, 
ROBOI,  did  not  inhibit  lung  metastasis  formation  (data  not  shown).  Collectively,  the 
results  show  that  these  nine  genes  are  not  only  markers  but  also  functional  mediators  of 
lung-specific  metastasis. 
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The  lung  metastasis  signature  in  primary  tumors 

A  biologically  meaningful  and  clinically  relevant  gene  profile  that  mediates  lung 
metastasis  should  be  uniquely  expressed  by  a  subgroup  of  patients  that  relapse  to  the 
lung  and  it  should  associate  with  clinical  outcome.  To  test  this,  a  cohort  of  82  breast 
cancer  patients  treated  at  our  institution  was  used  in  a  univariate  Cox  proportional 
hazards  model  to  relate  the  expression  level  of  each  lung  metastasis  signature  gene 
with  clinical  outcome.  Twelve  of  the  54  genes  are  significantly  associated  with  lung 
metastasis-free  survival,  including  MMP1,  CXCL1,  and  PTGS2  (Supplementary  Table 
5).  A  cross-validated  multivariate  analysis  using  a  linear  combination  of  each  of  the  54 
genes  weighted  by  the  univariate  results26  distinguished  patients  divided  into  a  high  or  a 
low  risk  group  for  developing  lung  metastasis  (10  year  lung  metastasis-free  survival  of 
56%  vs  89%,  p=0.0018;  see  Supplementary  Figure  S5)  but  not  bone  metastasis  (70% 
vs  79%,  p=0.31).  When  a  similar  multivariate  analysis  was  performed  by  weighting 
each  gene  by  a  t-statistic  derived  from  comparing  its  expression  between  the  LM2  cell 
lines  with  the  parental  MDA-MD-231  cells,  the  54  genes  again  distinguished  patients  at 
high  risk  for  developing  lung  metastasis  (62%  vs  88%,  p=0.01 ;  see  Supplementary 
Figure  S5)  but  not  bone  metastasis  (75%  vs  79%,  p=0.49).  These  results  suggest  that 
a  clinically  relevant  subgroup  of  patients  express  certain  combinations  of  lung 
metastasis  signature  genes. 

To  directly  determine  the  extent  to  which  breast  cancers  express  the  lung  metastasis 
signature  in  a  manner  resembling  the  LM2  cell  lines,  the  54-genes  were  used  to 
hierarchically  cluster  the  MSKCC  data  set.  Manual  inspection  of  branches  in  the 
dendogram  revealed  a  group  of  primary  tumors  that  concordantly  expressed  many 
elements  of  this  signature  (Figure  4a,  dashed  red  box).  In  particular,  a  subgroup  of 
primary  tumors  expressed  to  varying  degrees  a  majority  of  the  nine  genes  that  were 
functionally  validated.  Interestingly,  many  patients  that  developed  lung  metastasis  were 
among  this  group.  Tumors  in  this  group  predominantly  expressed  markers  of  clinically 
aggressive  disease  including  negative  estrogen  receptor/progesterone  receptor  status, 
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a  Rosetta-type  poor-prognosis  signature8,  and  a  basal  cell  subtype  of  breast  cancer27. 
There  was  no  association  of  our  signature  with  high  HER2  expression.  A  molecularly 
similar  subgroup  of  breast  cancer  was  identified  when  the  clustering  analysis  was 
repeated  on  a  previously  published  Rosetta  microarray  data  set  of  breast  cancer 
patients9  (Supplementary  Figure  S6),  suggesting  that  the  findings  are  not  unique  to  our 
cohort  of  patients. 

Although  the  results  of  the  hierarchical  clustering  are  suggestive,  this  approach  can  lead 
to  arbitrary  class  assignments  and  is  generally  not  ideal  for  class  prediction28. 

Therefore,  we  took  advantage  of  the  repeated  observation  of  our  signature  in  two 
independent  data  sets.  For  training  purposes  the  Rosetta  data  set  was  used  to  define  a 
group  of  patients  expressing  the  lung  metastasis  signature  most  resembling  the  LM2 
cell  lines  (Supplementary  Figure  S7).  All  48  out  of  the  54  lung  metastasis  genes  that 
were  shared  between  the  MSKCC  and  Rosetta  data  set  microarray  platforms  were 
subsequently  utilized  to  generate  a  classifier  to  distinguish  these  tumors  from  the 
remaining  tumors  in  the  cohort  (Supplementary  Table  6).  This  classifier  was  then 
applied  to  the  MSKCC  cohort  to  identify  tumors  that  express  the  lung  metastasis 
signature  in  a  manner  resembling  the  LM2  cell  lines.  These  patients  had  a  markedly 
worse  lung  metastasis-free  survival  (p<0.001;  Figure  4b)  but  not  bone  metastasis-free 
survival  (p=0.15;  Figure  4b).  These  results  were  independent  of  ER  status  and 
classification  as  a  Rosetta-type  poor  prognosis  tumor  (Figure  4c).  Six  of  the  nine  genes 
that  we  tested  in  functional  validation  studies  ( MMP1 ,  CXCL1,  PTGS2,  ID1,  VCAM1, 
and  EREG)  were  among  the  18  most  univariately  significant  (p<0.05)  genes  that 
distinguished  the  patients  used  to  train  the  classifier  (Supplementary  Figure  S7  cluster  3 
and  Table  1),  and  classification  using  only  these  18  genes  gave  similar  results  (data  not 
shown).  The  three  remaining  genes  (SPARC,  IL13RA2,  MMP2)  are  members  of  the 
lung  metastasis  virulence  subset  and  were  expressed  only  in  the  most  highly  metastatic 
cell  lines  in  our  model  system  (Figure  2d). 

Breast  tumorigenicity  and  lung  metastagenicity  partially  overlap 
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How  and  when  metastasis  genes  are  acquired  is  unknown29.  One  explanation  for  the 
expression  of  a  lung  metastasis  signature  in  a  subgroup  of  primary  breast  cancer  is  that 
these  genes  may  confer  a  growth  advantage  to  the  primary  tumor  while  allowing  growth 
at  distant  sites7.  To  test  this  hypothesis,  MDA-MB-231  cells  were  orthotopically  injected 
into  the  mammary  fat  pad  of  immunodeficient  mice.  We  found  that  the  1834  (LMO)  and 
4175  (LM2)  cell  populations  were  progressively  more  aggressive  at  growing  in  the 
mammary  fat  pad  compared  with  the  parental  cell  line.  This  correlated  with  expression 
of  lung  metastagenicity  genes  (Figure  5a;  Figure  2c)  and  was  not  due  to  a  general 
enhancement  of  growth  because  the  4175,  1834,  and  parental  populations  had  a 
comparable  ability  to  metastasize  to  bone  (refer  to  Figure  Id).  Furthermore,  the  4175 
and  1834  populations  were  also  more  metastatic  to  the  lungs  from  the  orthotopic  site 
after  primary  tumor  resection,  re-capitulating  the  phenotypes  observed  using  the  tail 
vein  metastasis  assay  (Figure  5b).  In  contrast,  the  virulently  bone  metastatic  population 
18334  was  only  marginally  more  aggressive  in  the  mammary  fat  pad  compared  to  the 
parental  cells  and  did  not  metastasize  to  lung  following  primary  tumor  resection  (Figures 
5a  and  5b). 

To  identify  which  of  the  genes  in  the  lung  metastasis  signature  may  be  conferring 
growth  at  the  primary  tumor  site,  we  quantified  mammary  fat  pad  tumor  growth  of  4175 
cell  populations  with  stable  knockdown  of  various  lung  metastasis  genes  that  were 
previously  assayed  for  effects  on  metastatic  behavior  (refer  to  Figures  3c  and  3d). 
Whereas  knockdown  of  IL13Ra2,  SPARC,  and  VCAM1  decreased  lung  metastatic 
ability  but  not  orthotopic  tumor  growth,  knockdown  of  ID1  resulted  in  a  statistically 
significant  reduction  in  both  (Figure  5c  and  Figure  3d).  These  data  suggest  that  some 
lung  metastasis  genes  facilitate  both  breast  tumorigenicity  and  lung  metastagenicity, 
whereas  others  confer  growth  advantages  exclusively  in  the  lung  microenvironment. 
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We  have  identified  a  set  of  genes  that  mediates  breast  cancer  metastasis  to  lung  and 
clinically  correlates  with  the  development  of  lung  metastasis  when  expressed  in  primary 
breast  cancers.  Many  of  the  genes  in  this  signature  have  not  been  previously  linked  to 
metastasis.  Together  with  the  bone,  the  lung  is  one  of  the  most  frequent  targets  of 
breast  cancer  metastasis  in  humans.  We  provide  evidence  that  these  two  sites  impose 
different  requirements  for  the  establishment  of  metastases  by  circulating  cancer  cells. 

In  addition  to  providing  clinical  validation,  potential  prognostic  tools  and  possible  targets 
for  cancer  treatment,  the  present  findings  shed  new  light  into  the  biology  of  breast 
cancer  metastasis. 

Many  of  the  genes  in  the  lung  metastasis  signature  are  frequently  expressed  in  all 
MDA-MB-231  subpopulations  that  metastasize  to  the  lung,  regardless  of  whether  these 
cells  were  randomly  picked  from  the  parental  cell  line  or  selected  in  vivo.  The  majority 
of  these  genes,  which  we  denote  as  promoting  lung  metastagenicity,  encode 
extracellular  products  including  growth  and  survival  factors  (e.g.  the  HER/ErbB  receptor 
ligand  Epiregulin),  chemokines  (CXCL1),  cell  adhesion  receptors  (e.g.  ROBOI)  and 
extracellular  proteases  (MMP1).  They  also  include  intracellular  enzymes  (e.g.  COX2) 
and  transcriptional  regulators  (e.g.  ID1),  as  well  as  several  intriguing  downregulated 
genes.  Their  expression  pattern  is  tightly  correlated  with  lung  metastatic  activity.  When 
tested  by  overexpression  in  poorly  metastatic  cells  or  by  RNAi-mediated  knockdown  in 
highly  metastatic  cells,  several  genes  in  this  group  function  as  mediators  of  lung 
metastasis  but  not  bone  metastasis.  Furthermore,  in  the  cohort  of  human  breast  cancer 
primary  tumors  examined,  those  expressing  the  lung  metastasis  signature  had  a 
significantly  worse  lung  metastasis-free  survival  but  not  bone  metastasis-free  survival. 
Therefore,  this  signature  appears  to  include  a  set  of  clinically  relevant  genes  that 
mediate  a  metastagenicity  function30,31  with  selectivity  to  the  lung. 

Recent  data  as  well  as  our  data  reveal  the  existence  of  metastasis  gene  signatures 
expressed  by  primary  tumors.  It  is  unclear  at  what  point  these  metastasis  gene 
signatures  are  acquired  during  the  process  of  tumorigenesis  since  the  selection 


11 


pressure  for  this  acquisition  is  unknown.  One  possibility  is  that  elements  of  metastasis 
gene  signatures  may  play  a  role  in  primary  tumor  growth.  Consistent  with  this  idea,  the 
in  vivo  selected  cell  lines  expressing  the  lung  metastagenicity  signature  are  more 
tumorigenic  when  implanted  in  the  mammary  glands  of  mice.  Despite  promoting  growth 
in  the  mammary  gland  and  in  the  lung,  these  genes  are  not  general  mediators  of 
neoplastic  growth.  Therefore,  many  lung  metastasis  signature  genes  appear  to  enhance 
growth  both  within  the  breast  and  the  lung  (Figure  5d).  These  overlapping  functions 
may  explain  how  cells  expressing  genes  involved  in  metastasis  can  be  selected  for  in 
the  primary  tumor,  providing  insight  into  the  interpretation  of  primary  tumor  microarray 
data. 

Another  subset  of  the  lung  metastasis  genes  is  overexpressed  only  in  rare,  virulently 
metastatic  cells  selected  in  vivo.  Several  of  these  genes  mediate  lung  metastasis  in  our 
functional  assays.  Many  in  this  class  encode  extracellular  proteins  (e.g.  SPARC, 

MMP2).  With  some  exceptions  (e.g.  the  receptors  IL13RA2,  VCAM1),  this  group  of 
genes  is  sporadically  expressed  in  human  primary  breast  tumors.  We  propose  that 
these  genes  act  mainly  as  virulence  genes30,31  that  may  allow  tumors  to  aggressively 
invade,  colonize,  and  grow  in  the  lung  without  markedly  contributing  to  primary  tumor 
growth  (Figure  5d).  As  such,  their  expression  may  be  rare  in  primary  tumors  but  strongly 
selected  for  once  such  cells  reach  the  lung.  Supporting  this  model,  a  recent  study 
analyzing  MMP2  expression  in  matched  primary  breast  cancers  and  pleural  effusions 
found  that  MMP2  levels  are  specifically  enriched  at  the  metastatic  site32. 

Breast  cancer  is  a  heterogeneous  disease  with  diverse  metastatic  behavior.  As  a 
consequence,  patients  differ  widely  in  prognosis  and  survival.  Attempts  to  molecularly 
classify  this  disease  have  yielded  several  useful  markers  of  poor  prognosis.  However,  to 
our  knowledge  none  of  these  markers  have  thus  far  been  shown  to  act  as  functional 
mediators  that  account  for  the  diversity  of  breast  cancer  metastases.  In  contrast,  our 
lung  metastasis  signature  seems  to  identify  poor-prognosis  patients  who  are  at  high  risk 
of  selectively  developing  lung  metastasis,  consistent  with  the  functional  testing  done 
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experimentally.  Further  studies  using  additional  patient  cohorts  and  a  delineation  of  the 
role  of  these  genes  in  specific  steps  of  the  metastatic  process,  should  lead  to  a  better 
understanding  of  the  biology  of  metastasis  and  its  susceptibilities  to  treatment. 
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Experimental  Procedures 

Cell  lines.  The  parental  MDA-MB-231  cell  line  was  obtained  from  the  American  Type 
Tissue  Collection.  Its  derivative  cell  lines  and  SCPs  were  previously  described4.  Cells 
were  grown  in  high-glucose  Dulbecco’s  modified  Eagles  medium  with  10%  fetal  bovine 
serum.  For  bioluminescent  tracking,  cell  lines  were  retrovirally  infected  with  a  triple 
fusion  protein  reporter  construct  encoding  herpes  simplex  virus  thymidine  kinase  1 , 
green  fluorescent  protein  (GFP)  and  firefly  luciferase13,33.  GFP-positive  cells  were 
enriched  by  fluorescence-activated  cell  sorting. 

Animal  studies.  All  animal  work  was  done  in  accordance  with  an  IACUC  approved 
protocol.  Four  to  6-week-old  Balb/c  nude  mice  (NCI)  were  used  for  all  xenografting 
studies.  For  lung  metastasis  formation,  2x1 05  viable  cells  were  washed  and  harvested 
in  PBS  and  subsequently  injected  into  the  lateral  tail  vein  in  a  volume  of  0.1  mL. 
Endpoint  assays  were  conducted  at  15  weeks  post-injection  unless  significant  morbidity 
required  that  the  mouse  be  sacrificed  earlier.  For  bone  metastasis,  1x10s  cells  in  PBS 
were  injected  into  the  left  ventricle  of  anesthetized  mice  (100  mg/kg  Ketamine;  10  mg/kg 
Xylazine)4.  Mice  were  imaged  for  luciferase  activity  immediately  after  injection  to 
exclude  any  that  were  not  successfully  xenografted. 

For  mammary  fat  pad  tumor  assays,  cells  were  harvested  by  trypsinization,  washed 
twice  in  PBS  and  counted.  Cells  were  then  resuspended  (IxlO7  cells/ml)  in  a  50:50 
solution  of  PBS  and  Matrigel.  Mice  were  anesthetized,  a  small  incision  was  made  to 
visualize  the  mammary  gland  and  IxlO6  cells  were  injected  directly  into  the  mammary 
fatpad.  The  incision  was  closed  with  wound  clips  and  primary  tumor  outgrowth  was 
monitored  weekly  by  taking  measurements  of  the  tumor  length  (L)  and  width  (W).  Tumor 
volume  was  calculated  as  per  4/3jtxL/2(W/2)2.  For  metastasis  assays,  tumors  were 
surgically  resected  when  they  reached  a  tumor  volume  greater  than  300  mm3.  After 
resection,  the  mice  were  monitored  by  bioluminescent  imaging  for  the  development  of 
metastases. 
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Bioluminescent  imaging  and  analysis.  Mice  were  anesthetized  and  retro-orbitally 
injected  with  1.5  mg  of  D-luciferin  (15  mg/mL  in  PBS).  Imaging  was  completed  between 
2-5  minutes  post-injection  using  a  Xenogen  IVIS  system  coupled  to  Living  Image 
acquisition  and  analysis  software  (Xenogen).  For  BLI  plots,  photon  flux  was  calculated 
for  each  mouse  using  a  rectangular  region  of  interest  (ROI)  encompassing  the  thorax  of 
the  mouse  in  a  prone  position.  This  value  was  scaled  to  a  comparable  background 
value  (from  a  luciferin-injected  mouse  with  no  tumor  cells),  and  then  normalized  to  the 
value  obtained  immediately  post-xenografting  (day  0),  so  that  all  mice  had  an  arbitrary 
starting  BLI  signal  of  100. 

RNA  isolation,  labeling  and  microarray  hybridization.  Methods  for  RNA  extraction, 
labeling,  and  hybridization  for  DNA  microarray  analysis  of  the  cell  lines  have  been 
previously  described4.  For  the  primary  breast  tumor  data,  tissues  from  primary  breast 
cancers  were  obtained  from  therapeutic  procedures  performed  as  part  of  routine  clinical 
management.  Samples  were  snap  frozen  in  liquid  nitrogen  and  stored  at  -80°C.  Each 
sample  was  examined  histologically  using  hematoxylin  and  eosin  stained  cryostat 
sections.  Regions  were  manually  dissected  from  the  frozen  block  to  provide  consistent 
tumor  cell  content  of  greater  than  70%  in  tissues  used  for  analysis.  All  studies  were 
conducted  under  MSKCC  Institutional  Review  Board  approved  protocols.  RNA  was 
extracted  from  frozen  tissues  by  homogenization  in  TRIzol  reagent  (GIBCO/BRL)  and 
evaluated  for  integrity.  Complementary  DNA  was  synthesized  from  total  RNA  using  a 
T7-promoter-tagged-dT  primer.  RNA  target  was  synthesized  by  in  vitro  transcription  and 
labeled  with  biotinylated  nucleotides  (Enzo  Biochem,  Farmingdale,  NY).  Labeled  target 
was  assessed  by  hybridization  to  Test3  arrays  (Affymetrix,  Santa  Clara,  CA).  All  gene 
expression  analysis  was  carried  out  using  HG-U133A  GeneChip.  Gene  expression  was 
quantitated  using  MAS  5.0  or  GCOS  (Affymetrix).  All  microarray  data  has  been 
submitted  to  the  Gene  Expression  Omnibus  (GEO)  under  accession  number  GSE2603. 

Statistical  analysis.  The  Kaplan-Meier  method  was  used  to  estimate  survival  curves  and 
the  log-rank  test  was  used  to  test  for  differences  between  curves  using  WinSTAT  (R.  Fitch 
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Software).  The  site  of  distant  metastasis  for  the  patients  in  the  MSKCC  data  set  was 
determined  from  patient  records.  Patients  with  lung  metastasis  developed  metastasis  only 
to  the  lung  or  within  months  of  metastasis  to  other  sites.  A  detailed  description  of  analytical 
methods  used  in  the  paper  is  provided  in  the  Supplementary  Methods  section. 

Descriptions  of  additional  experimental  procedures  used  are  available  in  the 
Supplementary  Methods  section  accompanying  the  paper  on  the  Nature  website. 
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Table  1.  Partial  List  of  Lung  Metastasis  Signature  Genes  Used  to  Classify 
Primary  Breast  Cancers  Expressing  the  Lung  Metastasis  Signature. 


p-value 

UG  cluster 

Gene 

symbol 

Description 

<0.000001 

Hs.1 18400 

FSCN1 

Fascin  homolog  1 ,  actin-bundling  protein  (Strongylocentrotus 
purpuratus) 

<0.000001 

Hs.83169 

MMP1 

Matrix  metalloproteinase  1  (interstitial  collagenase) 

<0.000001 

Hs.9613 

ANGPTL4 

Angiopoietin-like  4 

0.000006 

Hs. 74120 

C10orf116 

Chromosome  10  open  reading  frame  116 

0.00002 

Hs.789 

CXCL1 

Chemokine  (C-X-C  motif)  ligand  1  (melanoma  growth 
stimulating  activity,  alpha) 

0.000355 

Hs.1 96384 

PTGS2 

Prostaglandin-endoperoxide  synthase  2  (prostaglandin  G/H 
synthase  and  cyclooxygenase) 

0.000444 

Hs.1 85568 

KRTHB1 

Keratin,  hair,  basic,  1 

0.000506 

Hs.1 09225 

VCAM1 

Vascular  cell  adhesion  molecule  1 

0.000627 

Hs.1 7466 

RARRES3 

Retinoic  acid  receptor  responder  (tazarotene  induced)  3 

0.001263 

Hs.368256 

LTBP1 

Latent  transforming  growth  factor  beta  binding  protein  1 

0.004365 

Hs.444471 

KYNU 

Kynureninase  (L-kynurenine  hydrolase) 

0.005179 

Hs.421986 

CXCR4 

Chemokine  (C-X-C  motif)  receptor  4 

0.006426 

Hs. 77667 

LY6E 

Lymphocyte  antigen  6  complex,  locus  E 

0.007153 

Hs.410900 

ID1 

Inhibitor  of  DNA  binding  1,  dominant  negative  helix-loop-helix 
protein 

0.010871 

Hs. 255149 

MAN1A1 

Mannosidase,  alpha,  class  1A,  member  1 

0.032361 

Hs. 388589 

NEDD9 

Neural  precursor  cell  expressed,  developmentally  down- 
regulated  9 

0.03713 

Hs.1 15263 

EREG 

Epiregulin 

0.046859 

Hs.98998 

TNC 

Tenascin  C  (hexabrachion) 

There  are  48  unique  genes  shared  between  MSKCC  and  Rosetta  microarray  platforms. 
Patients  from  the  Rosetta  training  set  were  used  to  define  a  class  label  for  patients  that 
either  express  or  do  not  express  the  lung  metastasis  signature.  Shown  is  the  p-value  of 
a  t-test  comparing  the  difference  in  gene  expression  between  these  two  classes 
(Supplementary  Figure  S7,  cluster  3).  Only  18  genes  with  a  p-value  <  0.05  are  shown. 
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Figure  1.  Selection  of  breast  cancer  cells  metastatic  to  lung,  a,  Flow  chart  of  the 
in-vivo  selection  of  organ-specific  metastatic  subpopulations  indicating  the  organs  from 
which  these  subpopulations  were  isolated.  Each  subsequent  lung  metastatic 
generation  is  designated  LMO,  LM1 ,  and  LM2.  The  LM2  cells  were  further  analyzed  for 
metastasis  by  either  tailvein  (TV)  or  intracardiac  (1C)  xenografting.  Metastatic 
propensities  for  all  cell  lines  used  in  this  study  are  listed  in  Supplementary  Table  1.  b, 
Representative  lungs  harvested  at  necropsy  and  bioluminescence  imaging  of  the 
indicated  cell  lines  are  shown  after  tailvein  or  intracardiac  injection,  c,  Hematoxylin 
staining  of  frozen  sections  of  lungs  from  mice  injected  with  moderately  metastatic  1834 
cells  show  a  mix  of  invading  lesions  ( asterisk )  and  emboli  within  the  vascular  space 
(arrowheads).  Vascular  walls  are  stained  with  the  endothelial  cell  marker  CD31 .  d,  The 
indicated  numbers  of  parental  cells  and  4175  (LM2)  cells  were  tested  for  lung  metastatic 
activity.  Plots  show  a  quantitation  of  the  luminescence  signal  as  a  function  of  time.  Data 
are  the  average  ±SEM  for  each  cohort.  (*)  p  <  0.05  using  a  one-sided  rank  test, 
compared  to  mice  injected  with  an  equivalent  number  of  Parental  cells.. 

Figure  2.  A  gene-expression  signature  associated  with  lung  metastasis,  a, 

Comparison  of  gene  expression  profiles  of  LM2  populations  with  parental  cells  identifies 
113  probe  sets  that  correlate  with  lung  metastatic  activity.  This  signature  clusters  in- 
vivo  selected  populations  and  single  cell-derived  progenies  (SCPs)  into  groups  that 
resemble  the  LM2  cell  lines  ( red  bar),  the  parental  MDA-MB-231  cell  line  ( green  bar),  or 
an  intermediate  group  ( blue  bar),  b,  LM2  populations  4175  and  4142  were  assayed  for 
lung  metastatic  activity  as  measured  by  BLI  and  compared  to  parental  populations  and 
various  SCPs13.  Plots  show  a  quantitation  of  the  luminescence  signal  as  a  function  of 
time.  Data  are  the  average  ±SEM  for  each  cohort.  Color-coding  is  as  in  panel  a.  c, 
Northern  blot  analysis  of  parental,  LMO,  LM1,  and  LM2  cell  lines  using  a  set  of  nine  lung 
metastasis  genes  selected  for  functional  validation,  as  well  as  four  intriguing  genes 
underexpressed  in  the  lung  metastatic  populations. 

Figure  3.  Genes  in  the  expression  signature  mediate  lung  metastasis.  a,b, 
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Retrovirus-mediated  expression  of  selected  genes  from  the  lung  metastasis  signature  in 
weakly  metastatic  parental  MDA-MB-231  cells.  Genes  were  tested  individually  (a)  or  in 
groups  of  3  or  6  genes  ( b ).  c,  Stable  short  hairpin  RNAi  constructs  were  retrovirally 
introduced  into  4175  lung  metastatic  cells,  and  their  effectiveness  at  knocking  down  the 
expression  of  their  intended  target  was  validated  at  the  protein  level  (ID1 ,  VCAM1 , 
SPARC)  or  mRNA  level  ( IL13RA2 ).  d,  4175  knockdown  cell  lines  were  xenografted  via 
the  tail  vein  to  assess  lung  metastatic  activity.  One  shRNA  vector  against  ID1  that  was 
ineffective  at  decreasing  expression  of  this  gene  serves  as  a  negative  control.  Data  are 
the  average  ±SEM  for  each  cohort.  (*)  p  <  0.05  using  a  one-sided  rank  test. 

Figure  4.  The  lung  metastasis  signature  in  human  primary  breast  tumors,  a, 

Hierarchical  clustering  of  primary  breast  carcinomas  from  a  cohort  of  82  breast  cancer 
patients  was  performed  using  the  54  lung  metastasis  signature  genes.  A  dendrogram  of 
the  tumors  is  shown  at  the  top,  with  tumors  from  patients  that  developed  lung 
metastasis  (black  circles)  or  non-pulmonary  sites  (yellow  circles)  denoted.  A  sub¬ 
cluster  with  a  reproducibility  index  of  0.71  ( dashed  red  box)  groups  tumors  that  tended 
to  express  the  lung  metastasis  signature  in  a  manner  resembling  the  LM2  cell  lines.  The 
genes  were  also  clustered  and  gene  names  are  on  the  right.  Functionally  validated 
genes  are  in  red.  The  Rosetta  poor  prognosis  signature  is  displayed  with  the  genes 
underexpressed  ( green  bar)  and  overexpressed  ( red  bar)  in  poor  prognosis  tumors 
indicated  on  the  left.  The  expression  of  HER2,  progesterone  receptor  {PR),  estrogen 
receptor  ( ER ),  and  basal  and  luminal  keratins  are  shown.  Expression  of  the  lung 
metastasis  signature  was  confirmed  in  the  independent  Rosetta  breast  cancer  cohort 
(Supplementary  Figure  S6).  b,  Lung  metastasis-free  survival  and  bone  metastasis-free 
survival  for  MSKCC  patients  that  either  express  (red  line)  or  do  not  express  (blue  line) 
the  lung  metastasis  signature  based  on  a  classifier  trained  using  the  Rosetta  cohort 
(Supplementary  Figure  S7  and  Supplementary  Methods).  The  p-value  for  each  survival 
curve  is  shown,  c,  Lung  metastasis-free  survival  restricted  to  patients  with  ER-negative 
tumors  or  Rosetta-type  poor  prognosis  tumors. 
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Figure  5.  Breast  tumorigenicity  and  lung  metastagenicity  partially  overlap,  a, 

Representative  MDA-MB-231  cell  populations  were  injected  into  the  mammary  fat  pad 
of  immunodeficient  mice  and  monitored  for  tumor  growth.  Each  curve  designates  mean 
tumor  volumes  in  cubic  millimeters  +/-  SEM.  The  number  of  mice  in  each  cohort  (n)  is 
indicated,  b,  As  depicted  in  the  schematic,  mice  were  inoculated  with  the  indicated 
MDA-MB-231  cells  into  the  mammary  fat  pad  and  tumors  were  removed  after  reaching 
300  mm3.  Lung  metastasis  was  monitored  with  BLI  and  normalized  photon  flux  was 
measured  two  weeks  after  removal  of  the  primary  tumor.  (*)  A  mouse  in  the  4175  cohort 
with  an  unusually  high  signal  of  36400  was  excluded,  c,  Growth  in  mammary  fat  pad  of 
highly  lung  metastatic  4175  (LM2)  cells  after  stable  shRNA  knockdown  of  the  indicated 
genes.  shControl  refers  to  a  cell  line  transduced  with  a  short  hairpin  construct  that  did 
not  result  in  effective  knockdown  of  its  target  gene.  (**)  p  <  0.01  by  a  one-sided  rank 
test,  d,  A  model  of  two  classes  of  genes  contained  within  the  lung  metastasis  signature. 
The  first  class  (Subset  A)  confers  both  breast  tumorigenicity  and  basal  lung 
metastagenicity.  Examples  may  include  ID1,  CXCL1,  PTGS2,  and  MMP1.  The  second 
class  (Subset  B)  confers  functions  specific  to  the  lung  microenvironment,  facilitating 
lung  metastatic  virulence.  Examples  may  include  SPARC  and  MMP2. 
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Supplementary  Methods  1:  additional  experimental  procedures  used 

Lung  histology.  Lungs  were  harvested  at  necropsy.  For  hematoxylin  and  eosin 
staining,  lungs  were  fixed  in  10%  neutral  buffered  formalin  overnight,  washed 
with  PBS  and  dehydrated  in  70%  ethanol  before  paraffin  embedding  (Histoserv). 
For  CD31  staining,  lungs  were  fixed  in  4%  paraformaldehyde  overnight  and 
treated  with  30%  sucrose  for  12-24  h  before  cryosectioning.  Staining  was 
performed  using  anti-CD31  antibody  (sc-1506,  Santa  Cruz  Biotechnology). 

Analysis  of  mRNA  and  protein  expression.  Total  RNA  from  subconfluent 
MDA-MB-231  cells  were  harvested  using  the  RNeasy  kit  (Qiagen).  Samples 
were  electrophoresed  in  MOPS  buffer  and  transferred  to  a  Hybond  N+ 
membrane  (Amersham).  Radioactive  probes  for  Northern  blotting  were  derived 
from  fragments  of  the  relevant  cDNA,  and  hybridization  was  done  at  68°C  for  3  h. 

For  immunoblotting,  cells  were  washed  with  PBS  and  lysed  in  RIPA  buffer  (50 

mM  Tris-HCI  pH  7.4,  1%  NP-40,  0.25%  Na-deoxycholate,  150  mM  NaCI,  1  mM 

EDTA)  supplemented  with  50  mM  NaF,  20  mM  [3-glycerophosphate,  and 

complete  protease  inhibitor  cocktail  (Roche).  Proteins  were  separated  by  SDS- 

PAGE,  and  transferred  to  PVDF  membranes  that  were  immunoblotted  with 

antibodies  against  ID1  or  VCAM1  (Santa  Cruz  Biotechnology),  SPARC  (R&D 

Systems),  and  oc-tubulin  (Sigma).  Secreted  MMP-1 ,  MMP-2  and  CXCL1  were 

analyzed  in  conditioned  media  using  commercially  available  ELISA  kits  (R&D 

Systems).  Cells  were  plated  in  triplicate  at  90%  confluency  in  6  well  plates,  and 

conditioned  media  was  collected  48  h  later.  Media  was  cleared  of  cells  by 

centrifuging  at  2000  rpm  for  5  min,  and  subsequently  assayed  for  protein 

concentration  according  to  the  protocols  for  the  relevant  ELISA  kits. 


Cell-surface  IL13Ra2  and  VCAM1  were  analyzed  by  flow  cytometry  in  cells 
harvested  with  trypsin-EDTA  and  washed  twice  with  cold  PBS.  CyChrome- 
conjugated  anti-human  VCAM1  (BD  Pharmingen),  phycoerythrin-conjugated  anti¬ 
human  IL13Ra2  (Cell  Sciences),  or  control  IgG  were  incubated  in  FACS  buffer 
(0.1%  sodium  azide  and  1%  bovine  serum  albumin  in  PBS)  at  concentrations 
recommended  by  the  supplier,  for  1  h  at  4  °C  in  the  dark.  Cells  were  washed 
twice  and  re-suspended  in  cold  FACS  buffer.  Flow  cytometry  data  was  collected 
on  a  FACScalibur  (BD)  instrument  and  analyzed  using  FlowJo  software. 

Overexpression  and  knockdown  constructs.  For  overexpression  studies,  human 
cDNAs  of  interest  were  cloned  into  pBabe-puro  and/or  pBabe-hygro  retroviral 
expression  vectors.  For  single  transductions,  20  pg  of  DNA  were  transfected  into 
the  amphotropic  GPG29  packaging  cell  line  using  Lipofectamine  2000  (Invitrogen) 
at  a  ratio  of  1:3  (pg  DNA:pl  Lipofectamine  2000).  Virus-containing  supernatants 
were  harvested  daily  between  48  and  96  h  post-transfection.  Media  was 
centrifuged  at  2000  rpm  for  5  minutes  and  subsequently  cleared  of  remnant  cells 
using  a  0.45  pm  syringe  filter  (VWR).  Filtered  viral  media  was  added  to  70% 
confluent  MDA-MB-231  cells  in  the  presence  of  8  pg/ml  polybrene  (Sigma),  and 
incubated  overnight.  72  h  post-infection,  cell  populations  were  treated  with  either 
puromycin  (Sigma)  or  hygromycin  (Calbiochem).  Expression  of  the  relevant 
transgenes  was  validated  by  Northern  blot  or  protein  expression  analysis. 

For  combination  overexpression  experiments,  groups  of  three  genes  expressing  the 
same  drug  resistance  marker  were  co-transfected  into  GPG29  packaging  cells  as 


described,  but  using  15  micrograms  of  each  plasmid.  Viral  harvesting  and  infection 
was  identical  to  that  described  above.  Sextet  transductions  were  generated  as  two 
sequential  triple  infections.  Cells  were  selected  for  the  first  drug  resistance  marker 
before  being  infected  and  selected  for  the  second  resistance  marker.  The  SPARC, 
ID1,  and  MMP1  triplet  encoded  a  puromycin-resistance  marker,  whereas  the 
VCAM1,  IL13RA2,  and  MMP2  as  well  as  the  CXCL1,  EREG,  and  COX2  triplets 
delivered  hygromycin-resistant  markers  into  the  recipient  cells. 

For  knockdown  experiments,  short  hairpin  RNAi  constructs  were  cloned  into  the 
pRetroSuper  plasmid  according  to  previously  published  protocols34.  Retroviral 
infection  into  4175  cells  was  achieved  as  described  above  for  the  overexpression 
constructs.  Multiple  hairpin  constructs  were  screened  for  effective  knockdown  of 
the  gene  product  of  interest.  19  nucleotide  target  sequences  that  resulted  in 
productive  knockdown  included:  5’-ggatcttgtgatctaaatc-3’  (SPARC),  5’- 
gaggaattacgtgctctgt-3’  ( ID1 ),  5’-ggtgaagacctatcgaaga-3’  ( IL13RA2 ).  For 
knockdown  of  VCAM1,  4175  cells  were  sequentially  infected  and  puromycin- 
selected  with  two  different  pRetroSuper  targeting  constructs,  encoding  5’- 
ggcagagtacgcaaacact-3’  and  5’-gtccctggaaaccaagagt-3’,  respectively.  Negative 
control  cell  lines  were  generated  by  infecting  with  a  pRetroSuper  construct 
targeting  5’-cggctgttactcacgcctc-3\  a  sequence  in  the  ID1  cDNA  that  did  not  yield 
any  appreciable  knockdown  of  the  protein  product  by  Western  blotting. 


Supplementary  Methods  2:  description  of  analytical  methods  used 


Microarray  data  analysis  of  MDA-MB-231  cell  lines 

Analysis  of  transcriptomic  profile  heterogeneity  within  MDA-MB-231  human 
breast  cancer  cell  line  was  performed  using  multidimensional  scaling  (MDS)  of 
single  cell-derived  progenies  (SCPs)  by  importing  the  Affymetrix  data  into 
BRBArray  Tools  3.2  (Developed  by  R.  Simon  and  A.P.  Lam, 
http://linus.nci.nih.gov/BRB-ArravTools.html).  A  list  of  1267  genes  that  was 
differentially  expressed  across  the  SCPs13  was  then  used  in  MDS  to  separate  the 
SCPs  based  on  Pearson  correlation  as  the  similarity  measure. 

To  identify  genes  associated  with  lung  metastasis  among  the  in  vivo  selected 
MDA-MB-231  cells,  class  labels  were  assigned  based  on  lung  metastatic 
behavior.  A  class  comparison  using  a  t-test  (GeneSpring  6.1)  was  done  between 
the  gene  expression  data  for  second  generation  in  vivo  selected  lung  metastatic 
populations  (LM2)  4173,  4175,  and  4180  compared  with  two  different  passages 
of  parental  MDA-MB-231  cells  (ATCC)  to  generate  an  initial  list  of  genes  that  are 
differentially  expressed  between  the  two  classes  with  a  p-value  less  than  0.05. 
The  data  was  further  filtered  to  eliminate  absent  genes  or  genes  expressed  at 
low  levels.  This  was  done  by  removing  genes  with  an  absent  flag  in  all  the 
samples  and  genes  with  a  raw  expression  score  of  less  than  200.  An  additional 
filter  was  applied  to  ensure  at  least  a  three  fold  change  in  expression  level 
between  LM2  and  ATCC,  resulting  in  a  final  list  consisting  of  113  gene  probe 
sets  (corresponding  to  95  unique  genes)  associated  with  lung  metastasis.  Using 


these  genes,  hierarchical  clustering  was  performed  on  a  cohort  of  in  vivo 
selected  cell  lines  in  addition  to  the  SCPs,  which  were  not  directly  used  in  the 
initial  class  comparison. 

The  partial  expression  of  the  113  gene  profile  by  the  moderately  lung  metastatic 
SCPs  suggests  that  this  113  gene  list  contains  genes  associated  with  baseline 
lung  metastatic  ability  (lung  metastagenicity),  and  genes  that  enhance  this 
baseline  behavior  (lung  metastatic  virulence).  We  reasoned  that  lung 
metastagenicity  genes  should  be  differentially  expressed  by  both  the  LM2 
populations  and  the  lung  metastatic  SCPs.  Thus,  a  list  of  59  candidate  lung 
metastagenicity  genes  (50  unique)  was  generated  by  taking  the  intersection  of 
the  1 13  genes  with  the  1267  genes  differentially  expressed  across  the  SCPs. 

We  reasoned  that  the  remaining  genes  either  represent  virulence  genes 
restricted  to  the  most  aggressive  lung  metastatic  populations  or  represent  false 
discoveries.  To  help  distinguish  between  these  possibilities  we  applied  a  more 
stringent  filter  to  the  parental  versus  LM2  class  comparison  and  also  compared 
the  LMO  populations  to  LM2.  This  resulted  in  a  list  of  42  candidate  lung 
metastatic  virulence  genes  (32  unique)  generated  by  taking  genes  with  either  a 
six  fold  difference  between  parental  and  LM2  populations,  or  a  three  fold 
difference  between  LMO  and  LM2.  The  metastagenicity  and  virulence  gene  lists 
were  overlapping  as  some  genes  were  associated  with  lung  metastagenicity  and 
had  expression  that  was  further  increased  in  the  LM2  populations.  Nine 
biologically  intriguing  genes  from  either  list  were  selected  for  functional 
validation.  A  final  list  of  lung  metastagenicity  and  virulence  candidate  genes  was 


generated  by  combining  the  59  gene  lung  metastagenicity  list  with  the  nine 
genes  that  we  selected  for  functional  validation  that  were  not  already  on  the  list 
for  a  total  of  65  genes  (54  unique).  This  list  is  shown  in  Supplementary  Table  4. 

Univariate  and  multivariate  analysis  of  genes  comprising  the  lung 
metastasis  gene  signature 

In  order  to  determine  which  of  the  54  unique  genes  of  the  lung  metastasis 
signature  are  associated  with  lung  metastasis-free  survival,  we  utilized  a 
microarray  dataset  from  98  primary  breast  cancer  patients  treated  at  our 
institution.  We  excluded  those  with  incomplete  clinical  annotations  and/or  if  there 
was  less  than  three  years  of  clinical  follow-up,  resulting  in  82  analyzable 
samples.  At  the  time  of  tumor  resection,  these  patients  had  an  average  age  of 
55.8  years  (SD  =  13.5  yrs),  average  tumor  size  of  3.68  cm  (SD  =  1 .77  cm),  and 
an  average  of  3.5  positive  axillary  lymph  nodes  (SD  =  5.98).  The  vast  majority  of 
these  patients  received  adjuvant  chemotherapy  and/or  hormonal  therapy. 

For  univariate  analysis,  each  of  the  54  unique  genes  of  the  lung  metastasis 
signature  was  related  to  lung  metastasis-free  survival  based  on  the  Cox 
proportional  hazards  regression  model.  This  process  was  also  repeated  for  bone 
metastasis-free  survival.  The  results  of  this  analysis  are  shown  in 
Supplementary  Table  5.  For  multivariate  analysis,  the  method  of  Beer  et  al26  was 
used.  In  a  leave-one-out  cross-validated  (LOOCV)  manner,  all  54  unique  genes 
were  used  to  generate  a  risk  index  for  lung  metastasis.  In  each  round,  using  only 
the  training  cases,  this  risk  index  was  defined  as  a  linear  combination  of  gene 


expression  values  weighted  by  their  estimated  Cox  model  regression 
coefficients.  The  risk  of  the  single  training  case  was  then  determined.  If  the  risk 
index  for  the  training  case  was  in  the  top  20th  percentile  of  the  risk  index  scores, 
then  it  was  termed  high-risk.  Otherwise,  it  was  termed  low-risk.  The  20th 
percentile  was  used  as  a  cut-off  because  about  20  percent  of  the  cases  were 
expected  to  eventually  develop  lung  metastasis. 

Weighting  each  gene  by  its  estimated  Cox  model  coefficient  for  lung  metastasis 
is  a  way  to  test  the  ability  of  the  54  genes  to  predict  clinical  high  risk  groups.  A 
complementary  approach  is  to  test  the  ability  of  the  genes  to  predict  a  biological 
group  similar  to  the  LM2  cell  lines  to  see  if  this  group  is  at  high  risk  for  developing 
lung  metastasis.  These  two  methods  may  not  necessarily  give  the  same  results 
because  each  gene  is  weighted  differently.  For  example,  if  many  genes  that 
better  distinguish  LM2  from  the  parental  cell  lines  are  not  clinically  meaningful, 
the  two  classifiers  could  give  different  results.  To  classify  each  of  82  samples  in 
the  MSKCC  cohort  into  those  that  either  resembled  the  LM2  cell  lines  or  the 
parental  MDA-MB-231  cell  lines,  a  compound  covariate  classifier  (BRBArray 
Tools  3.2)  was  used.  Class  membership  into  these  two  groups  was  determined 
by  using  the  54  gene  lung  metastasis  signature.  A  compound  covariate  value 
was  defined  as  a  linear  combination  of  gene  expression  values  weighted  by  a  t- 
statistic  derived  from  comparing  the  LM2  cell  lines  (4173,  4175,  and  4180)  with 
two  different  passages  of  the  parental  MDA-MB-231  cell  line  (ATCC).  The 
classification  threshold  was  set  as  the  midpoint  of  the  sum  of  the  mean  values  of 


the  compound  covariate  for  each  sample  in  the  LM2  class  or  the  ATCC  class. 
Each  of  the  82  MSKCC  samples  was  then  predicted  to  be  in  the  LM2  class  if  its 
compound  covariate  was  closer  to  the  LM2  class  value  or  to  be  in  the  ATCC 
class  if  closer  to  the  ATCC  class  value.  Class  survival  analysis  for  lung 
metastasis-free  survival  and  bone  metastasis-free  survival  for  the  two  classes  of 
patients  was  then  performed  using  the  log-rank  test. 

Clustering  of  primary  breast  tumor  data 

For  several  reasons,  using  each  gene  of  our  lung  metastasis  signature  in  a  linear 
combination  as  mentioned  above,  may  have  limitations  in  an  analysis  for  a 
metastasis  gene  signature.  One  reason  is  because  different  tumors  in  a  high 
risk  group  may  have  different  combinations  of  individual  genes.  Furthermore,  an 
experimentally-derived  signature  will  likely  contain  features  that  are  peculiar  to 
the  experimental  system.  In  our  case,  we  were  hypothesizing  that  some  of  the 
genes  in  the  experimental  lung  metastasis  signature  were  serving  as  rare 
metastatic  virulence  genes,  making  it  unlikely  that  they  would  be  expressed  by  a 
bulk  primary  tumor  population.  Thus,  to  analyze  the  extent  to  which  expression 
of  the  lung  metastasis  signature  was  similar  to  the  LM2  cell  lines,  we  applied 
unsupervised  clustering  methods  using  both  the  MSKCC  data  set  and  a  second 
data  set  (Rosetta)  comprised  of  78  primary  tumors9.  The  Rosetta  data  set 
utilized  the  Rosetta  microarray  platform.  We  were  able  to  map  48  Affymetrix 
probe  sets  from  our  54  unique  genes  to  this  platform.  One  of  the  78  Rosetta 


samples  (sample  54)  was  omitted  from  the  analysis  because  of  a  high  number  of 
missing  values  for  many  of  our  genes  of  interest. 

Using  BRBArray  Tools  3.2,  we  performed  hierarchical  clustering  to  search  for 
subgroups  of  patients  that  express  the  lung  metastasis  genes  in  a  manner  similar 
to  the  LM2  cell  lines.  A  cluster  reproducibility  index  R  was  used  to  evaluate  the 
robustness  of  the  clusters  35.  The  R  measure  is  based  on  perturbing  the 
expression  data  with  Gaussian  noise,  re-clustering,  and  measuring  the  similarity 
of  the  new  clusters  to  the  original  clusters.  For  each  pair  of  samples  in  a  cluster 
of  the  original  data,  the  R  measure  is  the  proportion  of  the  time  they  stay  in  the 
same  cluster  after  perturbation  and  re-clustering  over  all  pairs  of  samples, 
perturbations,  and  re-clustering.  Clusters  with  high  R  value  were  identified  and 
manual  inspection  appeared  to  reveal  a  group  of  primary  breast  cancers  with 
concordant  expression  of  many  of  the  lung  metastasis  genes. 

We  also  wanted  to  relate  expression  of  the  lung  metastasis  signature  to  the 
Rosetta  poor-prognosis  gene-expression  signature9,  estrogen  receptor  (ER) 
status,  progesterone  receptor  (PR)  status,  HER2  status,  and  the  basal/luminal 
breast  cancer  subtypes27,36.  Mapping  of  the  70  gene  poor  prognosis  gene 
signature  from  the  Rosetta  platform  to  the  Affymetrix  platform  resulted  in  57 
shared  genes.  ER  and  PR  status  was  visualized  using  estrogen  receptor  alpha 
and  progesterone  receptor  probes  present  on  the  Affymetrix  U133A  GeneChip  or 
the  Rosetta  platform.  HER2  status  was  determined  by  probes  for  ERBB2  and  for 
GRB737.  The  probe  for  keratin  5  and  keratin  17  were  used  as  markers  for  the 
basal  cell  subtype  and  keratin  8  and  keratin  18  for  the  luminal  subtype27.  The 


heatmap  used  to  visualize  gene  expression  was  arranged  so  that  the  sample 
order  was  the  same  as  determined  by  the  hierarchical  clustering  results 
mentioned  above. 

Class  prediction 

From  the  MSKCC  and  the  Rosetta  data  sets,  it  appeared  that  there  exists  a 
breast  cancer  subgroup  of  predominantly  ER  negative,  poor  prognosis,  basal 
cell-like  breast  cancers  that  concordantly  express  many  elements  of  the  lung 
metastasis  signature.  Although  useful  for  class  discovery  and  analyzing 
relationships  among  clusters  of  genes,  hierarchical  clustering  is  not  a  statistical 
method  for  making  class  assignments.  This  is  because  partitioning  samples  into 
groups  by  inspection  can  be  arbitrary  and  it  does  not  provide  a  useful  class 
predictor  for  new  cases.  However,  recent  work  has  described  using  class 
prediction  methods  for  cancer  subgroups  defined  by  unsupervised  clustering 
across  data  sets38.  Thus,  we  took  advantage  of  the  observation  that  a  partial 
lung  metastasis  signature  is  expressed  in  two  independent  data  sets. 

The  Rosetta  data  set  was  used  as  the  training  set  to  define  the  class  labels  used 
for  prediction.  We  wished  to  identify  two  classes  -  samples  that  either  did  or  did 
not  express  the  lung  metastasis  signature  in  manner  resembling  the  LM2  cell 
lines.  Normalized  data  was  imported  into  TIGR  MultiExperiment  Viewer  3.0.3 
(ref. 39)  and  the  genes  were  median  centered.  The  method  of  K-means  was  used 
to  partition  the  training  set  based  on  the  48  genes  of  the  lung  metastasis 
signature  shared  by  both  microarray  platforms.  Choosing  the  right  number  of 


clusters  for  K-means  clustering  is  not  obvious  and  is  a  long-standing  problem. 

We  estimated  the  K  value  based  on  a  figure  of  merit40,  which  assesses  the 
predictive  power  of  clustering  using  a  left-out  sample.  This  showed  that  a  cluster 
number  up  to  four  resulted  in  a  sharp  decline  in  the  figure  of  merit  (lower  score  is 
better)  and  cluster  numbers  greater  than  this  tended  to  show  a  higher  error.  To 
control  for  variation  in  results  due  to  random  initializations  of  the  K-means 
algorithm,  we  also  used  K-means  support,  which  produces  consensus  K-means 
clusters  after  multiple  runs39.  Thus,  the  initial  cluster  number  was  set  to  four  with 
50  runs  per  iteration,  the  threshold  percentage  of  occurrence  in  the  same  cluster 
was  set  at  70%,  and  2000  K-means  iterations  were  performed.  Under  these 
conditions  four  consensus  clusters  were  produced  and  36  of  the  77  samples 
were  unassigned. 

The  expression  of  the  lung  metastasis  signature  for  each  of  the  four  consensus 
clusters  was  then  evaluated  for  similarity  to  the  LM2  cell  lines  by  calculating  the 
Pearson  correlation  between  the  cluster  centroids  and  the  centroid  for  the  LM2 
cell  lines  (Supplementary  Figure  S7).  The  mean  centered  gene  expression  data 
for  the  LM2  cell  lines  (4173,  4175,  4180)  and  two  different  passages  of  the  MDA- 
MD-231  parental  cells  was  used  to  calculate  the  LM2  centroid.  From  this 
analysis,  cluster  3  had  a  Pearson  correlation  of  0.19  while  the  other  clusters 
(including  the  unassigned  samples)  were  anti-correlated  (Supplementary  Figure 
S7).  Thus,  the  13  members  of  cluster  3  were  defined  as  a  robust  subgroup  of 
tumors  expressing  the  lung  metastasis  signature  and  all  other  samples  were 
labeled  as  not  expressing  this  signature.  Repeated  analysis  with  different 


parameters  used  in  K-means  clustering  confirmed  the  robustness  for 
membership  into  these  classes. 

Because  the  78  sample  Rosetta  training  set  and  the  98  sample  MSKCC  test  set 
were  on  different  microarray  platforms,  both  data  sets  were  z-score 
transformed41.  This  was  accomplished  by  taking  the  log2  transformed 
expression  value  of  each  gene,  subtracting  the  mean  expression  value  of  that 
gene,  and  dividing  this  difference  by  the  standard  deviation.  Each  z-score 
transformed  data  set  was  then  imported  into  BRBArray  Tools  3.2.  To  guard 
against  peculiarities  of  different  class  prediction  methods,  we  used  multiple 
predictors  including  1 -nearest  neighbor,  nearest  centroid,  and  support  vector 
machine  with  linear  kernel  and  default  penalty  costs.  In  leave-one-out  cross- 
validation  each  class  prediction  method  correctly  classified  95-96%  of  the 
Rosetta  samples.  Each  of  the  82  analyzable  samples  in  the  MSKCC  data  set 
was  then  classified  to  predict  which  belonged  to  the  lung  metastasis  signature 
class.  Results  for  each  of  the  three  prediction  methods  were  similar.  We  used 
the  consensus  results,  i.e.  two  out  of  the  three.  Survival  analysis  for  lung 
metastasis  and  bone  metastasis-free  survival  was  then  calculated  using  the  log- 
rank  test. 

In  an  alternative  approach  to  training  the  classifiers,  we  directly  compared  the 
lung  metastasis  signature  centroid  for  the  LM2  cell  lines  with  each  of  the  samples 
in  the  Rosetta  data  set  using  a  Pearson  correlation.  This  resulted  in  a  range  of 
correlations  from  -0.33  to  0.33.  We  selected  an  80th  percentile  threshold 
corresponding  to  a  correlation  of  greater  than  0.15.  These  16  samples  were  then 


used  in  training  for  class  prediction.  In  LOOCV,  the  class  prediction  methods 
correctly  classified  68-92%,  with  1-nearest  neighbor  being  the  worst  and  support 
vector  machine  being  the  best.  Results  after  classification  of  the  MSKCC  data 
set  were  comparable  to  the  K-means  based  classifier. 

Rosetta  poor  prognosis  classification 

We  were  able  to  map  54  of  the  70  Rosetta  poor  prognosis  signature  genes  to  the 
Affymetrix  U133A  platform.  To  ensure  that  this  reduction  in  gene  number  does 
not  significantly  reduce  the  prognostic  performance  of  the  full  signature  we 
repeated  the  analysis  of  van’t  Veer  et  al9  using  only  the  54  genes  that  are  also 
present  on  the  Affymetrix  platform.  Using  all  70  genes,  3  out  of  34  poor 
prognosis  cases  were  misclassified  and  1 1  out  of  44  good  prognosis  cases  were 
misclassified  (this  was  one  fewer  misclassification  than  reported  by  van’t  Veer  et 
al.).  Using  the  reduced  subset  of  54  genes,  5  poor  prognosis  cased  were 
misclassified  and  1 1  good  prognosis  cases  were  misclassified.  Thus,  the 
reduction  in  the  signature  had  little  impact  on  the  performance  of  the  classifier. 

Each  of  the  82  breast  cancer  primaries  from  the  MSKCC  data  set  were  assigned 
as  having  either  a  good  prognosis  signature  or  a  poor  prognosis  signature.  The 
method  used  by  van’t  Veer  et  al.  used  binary  data  based  on  5  year  metastasis- 
free  survival.  Fourteen  of  the  82  MKSCC  cases  did  not  have  at  least  five-years 
of  follow-up  and  had  to  be  excluded.  For  the  remaining  68  cases,  the  van’t  Veer 
analysis,  including  LOOCV,  was  performed  on  z-transformed  Affymetrix  data. 
Classification  was  based  on  correlation  with  the  good  prognosis  signature.  While 


van’t  Veer  used  a  threshold  of  about  0.3  (the  value  used  was  not  explicitly  stated 
in  their  methods),  we  used  0.  The  results  were  that  5  out  of  22  (23%)  poor 
prognosis  cases  were  misclassified  and  19  out  of  46  (41%)  good  prognosis 
cases  were  misclassified.  The  success  of  this  classification  was  unlikely  to  be 
due  to  chance  (p=0.001  based  on  1000  permutations).  The  remaining  14  cases 
were  classified  in  a  similar  manner,  except  using  the  68  with  5-year  survival  as  a 
training  set.  In  this  way,  all  82  were  classified  as  good  or  bad  prognosis. 

Clinical  annotations,  gene  lists,  and  results  of  class  assignments  and  predictions 
are  collated  in  a  workbook  supplied  as  supplementary  information. 


Supplementary  Table  1:  Cell  populations  used  in  metastasis  assays. 

Metastatic  propensity  to  bone  and  lung  for  all  in  vivo  selected  and  single  cell- 
derived  populations  used  in  the  study. 


Cell  Line 

Lung 

Metastatic 

Activity 

Bone 

Metastatic 

Activity 

Parental 

-/+ 

+ 

1833 

-/+ 

+++ 

1834 

+ 

+ 

3475 

++ 

+ 

3481 

++ 

+ 

2293 

+ 

+ 

2295 

+ 

+ 

4142 

+++ 

+ 

4173 

+++ 

+ 

4175 

+++ 

+ 

4180 

+++ 

+ 

SCP  2 

-/+ 

+++ 

SCP  3 

+ 

+ 

SCP  6 

- 

-/+ 

SCP  21 

- 

- 

SCP  25 

-/+ 

++ 

SCP  26 

- 

- 

SCP  28 

+ 

+++ 

SCP  32 

+ 

+ 

SCP  43 

+ 

+ 

SCP  46 

-/+ 

+++ 

Supplementary  Table  2:  Class  comparison  between  parental  MDA-MB-231 


and  LM2  cell  lines  selected  to  be  highly  metastatic  to  lung.  Shown  are  95 
unique  genes  from  1 1 3  Affymetrix  probe  sets.  Yellow  marks  61  overexpressed 
probe  sets  and  blue  marks  52  underexpressed  probe  sets  after  a  three-fold  filter 
was  applied. 


Probe  set 

Fold 

Change 

Gene  Title 

EE 

200665  s  at 

407.01 

secreted  protein,  acidic,  cysteine-rich  (osteonectin) 

SPARC 

203029  s  at 

147.27 

protein  tyrosine  phosphatase,  receptor  type,  N  polypeptide  2 

PTPRN2 

203030  s  at 

97.07 

protein  tyrosine  phosphatase,  receptor  type,  N  polypeptide  2 

PTPRN2 

207442  at 

58.71 

colony  stimulating  factor  3  (granulocyte) 

CSF3 

206172  at 

48.52 

interleukin  13  receptor,  alpha  2 

IL13RA2 

206785  s  at 

33.05 

killer  cell  lectin-like  receptor  subfamily  C,  member  1 1ll  killer 
cell  lectin-like  receptor  subfamily  C,  member  2 

KLRC1  III 
KLRC2 

202310  s  at 

20.03 

collagen,  type  1,  alpha  1 

COL1A1 

211534  x  at 

15.67 

protein  tyrosine  phosphatase,  receptor  type,  N  polypeptide  2 

PTPRN2 

221261  x  at 

14.65 

melanoma  antigen,  family  D,  4  III  melanoma  antigen,  family  D, 

4 

MAGED4 

202947  s  at 

13.50 

glycophorin  C  (Gerbich  blood  group) 

GYPC 

204475  at 

13.35 

matrix  metalloproteinase  1  (interstitial  collagenase) 

MMP1 

217388  s  at 

12.82 

kynureninase  (L-kynurenine  hydrolase) 

KYNU 

205767  at 

8.99 

Epiregulin 

EREG 

201645  at 

7.43 

tenascin  C  (hexabrachion) 

TNC 

204698  at 

6.77 

Interferon  stimulated  gene  20kDa 

ISG20 

205623  at 

6.75 

Aldehyde  dehydrogenase  3  family,  memberAI 

ALDH3A1 

212091  s  at 

6.35 

collagen,  type  VI,  alpha  1 

COL6A1 

213711  at 

6.34 

keratin,  hair,  basic,  1 

KRTHB1 

210663  s  at 

6.29 

kynureninase  (L-kynurenine  hydrolase) 

KYNU 

204748  at 

6.23 

prostaglandin-endoperoxide  synthase  2  (prostaglandin  G/H 
synthase  and  cyclooxygenase) 

PTGS2 

201720  s  at 

5.83 

Lysosomal-associated  multispanning  membrane  protein-5 

LAPTM5 

203571  s  at 

5.74 

chromosome  10  open  reading  frame  116,  adipose  specific  2 

C10ORF1 16 

204205  at 

5.29 

apolipoprotein  B  mRNA  editing  enzyme,  catalytic  polypeptide¬ 
like  3G 

APOBEC3G 

205463  s  at 

5.02 

platelet-derived  growth  factor  alpha  polypeptide 

PDGFA 

213194  at 

4.86 

roundabout,  axon  guidance  receptor,  homolog  1  (Drosophila) 

ROBOI 

212190  at 

4.63 

serine  (or  cysteine)  proteinase  inhibitor,  clade  E  (nexin, 
plasminogen  activator  inhibitor  type  1),  member  2 

SERPINE2 

220217  x  at 

4.56 

SPANX  family,  member  C 

SPANXC 

221009  s  at 

4.56 

angiopoietin-like  4 

ANGPTL4 

201564  s  at 

4.55 

fascin  homolog  1 ,  actin-bundling  protein  (Strongylocentrotus 
purpuratus) 

FSCN1 

216268  s  at 

4.47 

jagged  1  (Alagille  syndrome) 

JAG1 

201417  at 

4.45 

SRY  (sex  determining  region  Y)-box  4 

SOX4 

220922  s  at 

4.40 

SPANX  family,  member  B1  III  SPANX  family,  member  C 

SPANXB1 

201288  at 

4.26 

213428  s  at 

4.24 

220921  at 

4.21 

33304  at 

4.16 

205174  s  at 

4.01 

210933  s  at 

3.99 

204470  at 

3.89 

201069  at 

3.85 

205399  at 

3.76 

201061  s  at 

3.71 

221902  at 

3.62 

221760  at 

3.59 

219563  at 

3.57 

211368  s  at 

3.54 

209030  s  at 

3.42 

202728  s  at 

3.41 

204385  at 

3.24 

209505  at 

3.24 

201325  s  at 

3.21 

201721  s  at 

3.21 

206097  at 

3.17 

201324  at 

3.15 

203417  at 

3.12 

208937  s  at 

3.10 

219911  s  at 

3.10 

222182  s  at 

3.07 

222103  at 

3.07 

203585  at 

3.06 

221911  at 

3.02 

216488  s  at 

205017  s  at 

210046  s  at 

213075  at 

202149  at 

202610  s  at 

P?  0.32 

!  \  . 

1  0.32: 

210340  s  at 

221011  s  at 

m 

219959  at 

•  0.31 

213537  at 

///  SPANXC 


Rho  GDP  dissociation  inhibitor  (GDI)  beta  ARHGDIB 


collagen,  type  VI,  alpha  1  COL6A1 


SPANX  family,  member  B1  SPANXB1 


Interferon  stimulated  gene  20kDa  ISG20 


lutaminyl-peptide  cyclotransferase  (glutaminyl  cyclase)  QPCT 


fascin  homolog  1,  actin-bundling  protein  (Strongylocentrotus 
purpuratus)  FSCN1 


chemokine  (C-X-C  motif)  ligand  1  (melanoma  growth 

stimulating  activity,  alpha)  CXCL1 


matrix  metalloproteinase  2  (gelatinase  A,  72kDa  gelatinase, 

72kDa  type  IV  collagenase)  MMP2 


doublecortin  and  CaM  kinase-like  1  DCAMKL1 


Stomatin  STOM 


G  protein-coupled  receptor  153  GPR153 


mannosidase,  alpha,  class  1A,  member  1  MAN1A1 


chromosome  14  open  reading  frame  139  C14orf139 


caspase  1 ,  apoptosis-related  cysteine  protease  (interleukin  1 , 

beta,  convertase)  CASP1 


immunoglobulin  superfamily,  member  4  IGSF4 


latent  transforming  growth  factor  beta  binding  protein  1  LTBP1 


kynureninase  (L-kynurenine  hydrolase)  KYNU 


nuclear  receptor  subfamily  2,  group  F,  member  1  NR2F1 


epithelial  membrane  protein  1  EMP1 


Lysosomal-associated  multispanning  membrane  protein-5  LAPTM5 


solute  carrier  family  22  (organic  cation  transporter),  member  1- 

like  antisense  SLC22A1LS 


epithelial  membrane  protein  1  EMP1 


microfibrillar-associated  protein  2  MFAP2 


inhibitor  of  DNA  binding  1 ,  dominant  negative  helix-loop-helix 
protein  ID1 


solute  carrier  organic  anion  transporter  family,  member  4A1  SLC04A1 


CCR4-NOT  transcription  complex,  subunit  2  CNOT2 


Activating  transcription  factor  1  ATF1 


zinc  finger  protein  185  (LIM  domain)  ZNF185 


hypothetical  protein  LOC221 810  LOC221810 


ATPase,  Class  VI,  type  1 1 A  ATP1 1 A 


muscleblind-like  2  (Drosophila)  MBNL2 


isocitrate  dehydrogenase  2  (NADP+),  mitochondrial  IDH2 


olfactomedin-like  2A  OLFML2A 


neural  precursor  cell  expressed,  developmentally  down- 

regulated  9  NEDD9 


cofactor  required  for  Spl  transcriptional  activation,  subunit  2, 

150kDa _ CRSP2 

colony  stimulating  factor  2  receptor,  alpha,  low-affinity 
(granulocyte-macrophage)  CSF2RA 


likely  ortholog  of  mouse  limb-bud  and  heart  gene  III  likely 
ortholog  of  mouse  limb-bud  and  heart  gene  LBH 


molybdenum  cofactor  sulfurase  MOCOS 


major  histocompatibility  complex,  class  II,  DP  alpha  1  HLA-DPA1 


202237 

at 

206473  at 

201428 

at 

201843 

s 

at 

202017 

at 

202688 

_at 

205018 

s 

at 

203387 

s 

at 

212372 

at 

205805 

s 

at 

216060 

s_ 

at 

203974 

at 

204149 

s 

at 

210136 

at 

214040 

s 

at 

213067 

at 

207379 

at 

201137 

s 

at 

208306 

X 

at 

215193 

X 

at 

202986 

at 

206814 

at 

204070 

at 

202238 

s 

at 

HW-ikMlll 

207620 

s 

at 

211990 

_at 

202350 

s 

at 

211907 

s . 

at 

207214 

at 

211839 

s . 

at 

208209 

s 

at 

202145 

at 

211991 

_s_ 

at 

204238 

s 

at 

208161 

_s_ 

at 

209201 

x 

at 

210140 

at 

212942 

_s_ 

at 

217028 

at 

214827 

at 

m 

BUS 


m 

c 


nicotinamide  N-methyltransferase  NNMT 


membrane-bound  transcription  factor  protease,  site  2  MBTPS2 


claudin  4  CLDN4 


EGF-containinq  fibulin-like  extracellular  matrix  protein  1  EFEMP1 


epoxide  hydrolase  1 ,  microsomal  (xenobiotic)  EPHX1 


tumor  necrosis  factor  (liqand)  superfamily,  member  10  TNFSF10 


muscleblind-like  2  (Drosophila)  MBNL2 


TBC1  domain  family,  member  4  TBC1D4 


olypeptide  10,  non-muscle  MYH10 


receptor  tyrosine  kinase-like  orphan  receptor  1  ROR1 


dishevelled  associated  activator  of  morphogenesis  1  DAAM1 


haloacid  dehaloqenase-like  hydrolase  domain  containinq  1A  HDHD1A 


lutathione  S-transferase  M4  I  GSTM4 


LOC388483 


elsolin  (amyloidosis,  Finnish  type)  GSN 


olypeptide  10,  non-muscle  MYH10 


EGF-like  repeats  and  discoidin  l-like  domains  3  EDIL3 


major  histocompatibility  complex,  class  II,  DP  beta  1  HLA-DPB1 


major  histocompatibility  complex,  class  II,  DR  beta  3  HLA-DRB3 


major  histocompatibility  complex,  class  II,  DR  beta  3  HLA-DRB3 


aryl-hydrocarbon  receptor  nuclear  translocator  2  ARNT2 


nerve  growth  factor,  beta  polypeptide  NGFB 


retinoic  acid  receptor  responder  (tazarotene  induced)  3  RARRES3 


nicotinamide  N-methyltransferase  NNMT 


EGF-containinq  fibulin-like  extracellular  matrix  protein  1  EFEMP1 


calcium/calmodulin-dependent  serine  protein  kinase  (MAGUK 
family) 


Major  histocompatibility  complex,  class  II,  DP  alpha  1 


matrilin  2  MATN2 


par-6  partitioning  defective  6  homolog  beta  (C.  elegans)  III  par- 
6  partitioning  defective  6  homolog  beta  (C.  elegans)  PARD6B 


serine  protease  inhibitor,  Kazal  type  4  SPINK4 


colony  stimulating  factor  1  (macrophage)  CSF1 


complement  component  4  binding  protein,  beta  C4BPB 


lymphocyte  antigen  6  complex,  locus  E  LY6E 


major  histocompatibility  complex,  class  II,  DP  alpha  1  HLA-DPA1 


chromosome  6  open  reading  frame  108  C6orf108 


acetylserotonin  O-methyltransferase-like  ASMTL 


0.09  ATP-binding  cassette,  sub-family  C  (CFTR/MRP),  member  3  ABCC3 


chemokine  (C-X-C  motif)  receptor  4  CXCR4 


SSKOIO/fl  cystatin  F  (leukocystatin)  CST7 


KIAA1199  KIAA1199 


chemokine  (C-X-C  motif)  receptor  4  CXCR4 


ar-6  partitioning  defective  6  homolog  beta  (C.  elegans)  PARD6B 


E0.07 


Supplementary  Table  3:  Overlapping  genes  between  lung  and  bone 
metastasis  signatures.  The  113  probe  sets  (95  unique  genes)  from 
Supplementary  Table  2  were  overlapped  with  the  127  probe  sets  (102  unique 
genes)  previously  identified  as  the  gene-expression  signature  of  MDA-MB-231 
cell  populations  that  are  highly  metastatic  to  bone.  Shown  are  9  intersecting 
genes  (1 1  probe  sets)  and  whether  each  is  up-regulated  or  down-regulated  in 
either  the  bone  metastasis  signature  or  the  lung  metastasis  signature. 


Probe  set 

Description 

Gene  symbol 

Bone 

Lung 

201417  at 

SRY  (sex  determining  region  Y)-box  4 

SOX4 

down 

up 

203571  s  at 

adipose  specific  2 

C10orf116 

down 

up 

208161  s  at 

ATP-binding  cassette,  sub-family  C  (CFTR/MRP),  3 

ABCC3 

down 

down 

211991  s  at 

major  histocompatibility  complex,  class  II,  DP  alpha  1 

HLA-DPA1 

down 

down 

219563  at 

chromosome  14  open  reading  frame  139 

C14orf139 

■ 

■ 

204475  at 

matrix  metalloproteinase  1  (interstitial  collagenase) 

MMP1 

■ 

1 

209201  x  at 

Chemokine  (C-X-C  motif)  receptor  4 

CXCR4 

UP 

down 

220921  at 

sperm  protein  associated  with  the  nucleus,  X 
chromosome,  family  member  A1 

SPANXA1 

■ 

■ 

220922  s  at 

sperm  protein  associated  with  the  nucleus,  X 
chromosome,  family  member  A1 

SPANXA1 

■ 

215193  x  at 

major  histocompatibility  complex,  class  II,  DR  beta  1 

HLA-DRB1 

down 

down 

201137  s  at  |  major  histocompatibility  complex,  class  II,  DP  beta  1 


HLA-DPB1 


down 


down 


Supplementary  Table  4:  Lung  metastasis  candidate  genes.  Shown  are  54 
unique  genes  from  65  Affymetrix  probe  sets  representing  genes  associated  with 
lung  metastagenicity  and  virulence.  Overexpressed  fold  change  (yellow)  and 
underexpressed  fold  change  (blue)  from  comparing  parental  MDA-MB-231  and 
the  LM2  cell  lines  are  indicated. 


Probe  set 

Fold 

Change 

Gene  Title 

200665  s  at 
212667  at 

407.01 

secreted  protein,  acidic,  cysteine-rich  (osteonectin) 

SPARC 

206172  at 

48.52 

interleukin  13  receptor,  alpha  2 

IL13RA2 

206785  s  at 

33.05 

killer  cell  lectin-like  receptor  subfamily  C,  member  1 ///  killer  cell 
lectin-like  receptor  subfamily  C,  member  2 

KLRC1  III 
KLRC2 

204475  at 

13.35 

matrix  metalloproteinase  1  (interstitial  collagenase) 

MMP1 

217388  s  at 
210663  s  at 

12.82 

kynureninase  (L-kynurenine  hydrolase) 

KYNU 

205767  at 

8.99 

Epiregulin 

EREG 

201645  at 

7.43 

tenascin  C  (hexabrachion) 

TNC 

204698  at 

6.77 

interferon  stimulated  gene  20kDa 

ISG20 

205623  at 

6.75 

aldehyde  dehydrogenase  3  family,  memberAI 

ALDH3A1 

213711  at 

6.34 

keratin,  hair,  basic,  1 

KRTHB1 

204748  at 

6.23 

prostaglandin-endoperoxide  synthase  2  (prostaglandin  G/H 
synthase  and  cyclooxygenase) 

PTGS2 

201720  s  at 
201721  s  at 

5.83 

Lysosomal-associated  multispanning  membrane  protein-5 

LAPTM5 

203571  s  at 

5.74 

chromosome  10  open  reading  frame  116,  adipose  specific  2 

C10orf116 

213194  at 

4.86 

roundabout,  axon  guidance  receptor,  homolog  1  (Drosophila) 

ROBOI 

220217  x  at 

4.56 

SPANX  family,  member  C 

SPANXC 

221009  s  at 

4.56 

angiopoietin-like  4 

ANGPTL4 

201564  s  at 
210933  s  at 

4.55 

fascin  homolog  1 ,  actin-bundling  protein  (Strongylocentrotus 
purpuratus) 

FSCN1 

201417  at 
201416  at 

4.45 

SRY  (sex  determining  region  Y)-box  4 

SOX4 

220922  s  at 
220921  at 

4.40 

SPANX  family,  member  B1  III  SPANX  family,  member  C 

SPANXB1 

III  SPANXC 

213428  s  at 

4.24 

collagen,  type  VI,  alpha  1 

COL6A1 

204470  at 

3.89 

chemokine  (C-X-C  motif)  ligand  1  (melanoma  growth  stimulating 
activity,  alpha) 

CXCL1 

201069  at 

3.85 

matrix  metalloproteinase  2  (gelatinase  A,  72kDa  gelatinase, 

72kDa  type  IV  collagenase) 

MMP2 

201061  s  at 

3.71 

Stomatin 

STOM 

221902  at 

3.62 

G  protein-coupled  receptor  153 

GPR153 

221760  at 

3.59 

mannosidase,  alpha,  class  1A,  member  1 

MAN1A1 

219563  at 

3.57 

chromosome  14  open  reading  frame  139 

C14orf139 

211368  s  at 

3.54 

caspase  1,  apoptosis-related  cysteine  protease  (interleukin  1, 
beta,  convertase) 

CASP1 

209030  s  at 

3.42 

immunoglobulin  superfamily,  member  4 

IGSF4 

202728  s  at 

3.41 

latent  transforming  growth  factor  beta  binding  protein  1 

LTBP1 

209505  at 

3.24 

nuclear  receptor  subfamily  2,  group  F,  member  1 

NR2F1 

201325  s  at 
201324  at 

3.21 

epithelial  membrane  protein  1 

EMP1 

208937  s  at 

3.10 

inhibitor  of  DNA  binding  1,  dominant  negative  helix-loop-helix 
protein 

ID1 

222182  s  at 

3.07 

CCR4-NOT  transcription  complex,  subunit  2 

CNOT2 

203868  s  at 

2.17 

vascular  cell  adhesion  molecule  1 

VC  AMI 

213075  at 

imii 

olfactomedin-like  2A 

OLFML2A 

202149  at 

:  "  0:32. 

neural  precursor  cell  expressed,  developmental^  down- 
regulated  9 

NEDD9 

210340  s  at 

IIP 

colony  stimulating  factor  2  receptor,  alpha,  low-affinity 
(granulocyte-macrophage) 

CSF2RA 

219959  at 

molybdenum  cofactor  sulfurase 

MOCOS 

202017  at 

epoxide  hydrolase  1,  microsomal  (xenobiotic) 

EPHX1 

205018  s  at 
205017  s  at 

siiPs 

muscleblind-like  2  (Drosophila) 

MBNL2 

210136  at 

tttra 

LOC388483 

— 

214040  s  at 

lltlfoS®- 

gelsolin  (amyloidosis,  Finnish  type) 

GSN 

213067  at 

MYH10 

202986  at 

aryl-hydrocarbon  receptor  nuclear  translocator  2 

ARNT2 

204070  at 

fc*v.ollf 

retinoic  acid  receptor  responder  (tazarotene  induced)  3 

RARRES3 

201842  s  at 
201843  s  at 

SH 

EGF-containing  fibulin-like  extracellular  matrix  protein  1 

EFEMP1 

202350  s  at 

matrilin  2 

MATN2 

202145  at 

urn 

lymphocyte  antigen  6  complex,  locus  E 

LY6E 

211991  s  at 
213537  at 

0.13 

major  histocompatibility  complex,  class  II,  DP  alpha  1 

HLA-DPA1 

209394  at 

MM 

acetylserotonin  O-methyltransferase-like 

ASMTL 

208161  s  at 

ATP-binding  cassette,  sub-family  C  (CFTR/MRP),  member  3 

ABCC3 

212942  s  at 

MM 

KIAA1199 

KIAA1199 

IplUPSp! 

chemokine  (C-X-C  motif)  receptor  4 

CXCR4 

214827  at 

par-6  partitioning  defective  6  homolog  beta  (C.  elegans) 

PARD6B 

214827  at 


Supplementary  Table  5.  Expression  of  Genes  in  the  Lung  Metastasis 
Signature  Correlated  to  Lung  Metastasis-Free  Survival  in  Breast  Cancer 
Patients.  A  Cox  proportional  hazards  model  was  used  to  relate  gene  expression 
changes  of  the  54  gene  lung  metastasis  signature  to  lung  metastasis-free 
survival  in  82  breast  cancer  patients. 


Probe  set 

Hazard 

Ratio 

Lower 

95% 

Upper 

95% 

204070  at 

RARRES3 

0.291 

0.00001 

221009  s  at 

ANGPTL4 

1.661 

0.00005 

203571  s  at 

C10orf116 

0.608 

0.467 

0.792 

0.00047 

202728  s  at 

LTBP1 

3.364 

1.467 

7.711 

0.00074 

205017  s  at 

MBNL2 

3.133 

1.357 

7.231 

0.00169 

201564  s  at 

FSCN1 

1.975 

1.280 

3.047 

0.00201 

201324  at 

EMP1 

2.997 

1.411 

6.369 

0.00272 

210340  s  at 

CSF2RA 

1.805 

1.212 

2.687 

0.00283 

204475  at 

MMP1 

1.313 

1.064 

1.619 

0.00742 

212942  s  at 

KIAA1199 

■ 

1.076 

2.431 

0.02083 

204470  at 

CXCL1 

1.076 

1.708 

0.02191 

204748  at 

PTGS2 

1.451 

1.030 

2.043 

0.02628 

202986  at 

ARNT2 

0.746 

1.026 

0.06494 

213067  at 

MYH10 

0.674 

1.060 

0.06899 

213075  at 

OLFML2A 

0.434 

0.165 

1.139 

0.07305 

222182  s  at 

CNOT2 

0.365 

0.120 

1.108 

0.07775 

206785  s  at 

KLRC1 

0.752 

0.544 

1.040 

0.08261 

208161  s  at 

ABCC3 

0.776 

0.574 

1.048 

0.10283 

202145  at 

LY6E 

0.704 

0.437 

1.136 

0.13893 

202017  at 

EPHX1 

0.678 

0.387 

1.186 

0.17169 

209505  at 

NR2F1 

0.806 

0.579 

1.121 

0.21238 

210663  s  at 

KYNU 

1.235 

0.887 

HKI11I 

0.21883 

210136  at 

MBP 

1.431 

0.809 

0.22674 

219959  at 

MOCOS 

1.359 

0.830 

2.226 

0.23861 

201061  s  at 

STOM 

0.613 

0.267 

1.408 

0.24098 

213428  s  at 

COL6A1 

1.542 

0.722 

3.293 

0.25386 

219563  at 

C14orf1 39 

0.657 

0.319 

1.355 

0.25881 

220217  x  at 

SPANXC 

0.773 

0.474 

1.261 

0.28465 

213537  at 

HLA-DPA1 

0.786 

0.493 

1.253 

0.33430 

213711  at 

KRTHB1 

1.100 

0.899 

1.347 

0.36209 

201645  at 

TNC 

1.195 

0.805 

1.772 

0.37407 

201721  s  at 

LAPTM5 

1.305 

0.634 

2.687 

0.48354 

201842  s  at 

EFEMP1 

0.865 

0.570 

1.313 

0.49742 

213194  at 

ROBOI 

1.216 

0.699 

2.113 

0.49865 

214040  s  at 

GSN 

1.167 

0.717 

1.901 

0.51734 

220921  at 

SPANXB1 

0.892 

0.612 

1.301 

0.54461 

209030  s  at 

IGSF4 

0.300 

1.899 

0.54672 

202350  s  at 

MATN2 

0.658 

1.252 

0.55728 

208937  s  at 

ID1 

1.156 

0.716 

1.866 

0.56958 

209394  at 

ASMTL 

0.816 

0.400 

1.667 

0.58735 

221760  at 

MAN1A1 

0.890 

0.522 

1.519 

0.66920 

205767  at 

EREG 

1.058 

0.814 

1.374 

0.67603 

206172  at 

IL13RA2 

1.061 

0.691 

1.629 

0.78848 

211368  s  at 

CASP1 

1.065 

0.663 

1.710 

0.79193 

201069  at 

MMP2 

1.079 

0.592 

1.966 

0.80346 

203868  s  at 

VCAM1 

1.065 

0.576 

1.969 

0.83993 

204698  at 

ISG20 

0.973 

0.743 

1.273 

0.84223 

205623  at 

ALDH3A1 

0.957 

1.531 

0.85511 

201416  at 

SOX4 

0.941 

■EBU 

1.913 

0.86571 

214827  at 

PARD6B 

0.972 

0.648 

1.458 

0.88897 

217028  at 

CXCR4 

0.953 

0.482 

1.884 

0.88906 

221902  at 

GPR153 

0.964 

0.524 

1.773 

0.90587 

212667  at 

SPARC 

0.969 

0.489 

1.922 

0.92818 

202149  at 

NEDD9 

1.033 

0.510 

2.092 

0.92853 

Supplementary  Table  6.  Lung  Metastasis  Signature  Genes  Used  to  Classify 
Primary  Breast  Cancers  Expressing  the  Lung  Metastasis  Signature.  All 
genes  from  Table  1  are  shown. 


p-value 

UG 

cluster 

Gene 

symbol 

Description 

<0.000001 

Hs.1 18400 

FSCN1 

Fascin  homolog  1 ,  actin-bundling  protein 
(Strongylocentrotus  purpuratus) 

<0.000001 

Hs.83169 

MMP1 

Matrix  metalloproteinase  1  (interstitial  collagenase) 

<0.000001 

Hs.9613 

ANGPTL4 

Angiopoietin-like  4 

0.000006 

Hs.74120 

C10orf116 

Chromosome  10  open  reading  frame  116 

0.00002 

Hs.789 

CXCL1 

Chemokine  (C-X-C  motif)  ligand  1  (melanoma  growth 
stimulating  activity,  alpha) 

0.000355 

Hs.1 96384 

PTGS2 

Prostaglandin-endoperoxide  synthase  2  (prostaglandin  G/H 
synthase  and  cyclooxygenase) 

0.000444 

Hs.1 85568 

KRTHB1 

Keratin,  hair,  basic,  1 

0.000506 

Hs.1 09225 

VCAM1 

Vascular  cell  adhesion  molecule  1 

0.000627 

Hs.1 7466 

RARRES3 

Retinoic  acid  receptor  responder  (tazarotene  induced)  3 

0.001263 

Hs.368256 

LTBP1 

Latent  transforming  growth  factor  beta  binding  protein  1 

0.004365 

Hs.444471 

KYNU 

Kynureninase  (L-kynurenine  hydrolase) 

0.005179 

Hs. 421 986 

CXCR4 

Chemokine  (C-X-C  motif)  receptor  4 

0.006426 

Hs. 77667 

LY6E 

Lymphocyte  antigen  6  complex,  locus  E 

0.007153 

Hs.410900 

ID1 

Inhibitor  of  DNA  binding  1,  dominant  negative  helix-loop- 
helix  protein 

0.010871 

Hs. 255149 

MAN1A1 

Mannosidase,  alpha,  class  1A,  member  1 

0.032361 

Hs.388589 

NEDD9 

Neural  precursor  cell  expressed,  developmentally  down- 
regulated  9 

0.03713 

Hs.1 15263 

EREG 

Epiregulin 

0.046859 

Hs.98998 

TNC 

Tenascin  C  (hexabrachion) 

0.053773 

Hs.357901 

SOX4 

SRY  (sex  determining  region  Y)-box  4 

0.05492 

Hs.1 57986 

MOCOS 

Molybdenum  cofactor  sulfurase 

0.062067 

Hs.1 65725 

CNOT2 

CCR4-NOT  transcription  complex,  subunit  2 

0.071707 

Hs.436200 

LAPTM5 

Lysosomal-associated  multispanning  membrane  protein-5 

0.079271 

Hs.1 53647 

MATN2 

Matrilin  2 

0.080391 

Hs.1 56682 

IGSF4 

Immunoglobulin  superfamily,  member  4 

0.096189 

Hs.306692 

EMP1 

Epithelial  membrane  protein  1 

0.097858 

Hs. 105434 

ISG20 

Interferon  stimulated  gene  20kDa 

0.119096 

Hs.28031 1 

MYH10 

Myosin,  heavy  polypeptide  10,  non-muscle 

0.124785 

Hs.301198 

ROBOI 

Roundabout,  axon  guidance  receptor,  homolog  1 
(Drosophila) 

0.213167 

Hs.361748 

NR2F1 

Nuclear  receptor  subfamily  2,  group  F,  member  1 

0.230817 

Hs.1 2571 5 

MBNL2 

Muscleblind-like  2  (Drosophila) 

0.25087 

Hs. 367877 

MMP2 

MMP2 

0.254227 

Hs.446537 

GSN 

Gelsolin  (amyloidosis,  Finnish  type) 

0.255766 


Hs.531581 


GPR153 


G  protein-coupled  receptor  153 


0.274128 

0.345846 

0.36839 

0.423864 


Hs. 336046 
Hs.357004 
Hs.6111 
Hs.1 11779 


IL13RA2 

OLFML2A 

ARNT2 

SPARC 


0.507582  Hs.2490 
0.650845  Hs.76224 


CAS  PI 
EFEMP1 


0.75516 

0.764736 

0,830009 

0.830451 

0.843369 

0.846476 

0.867387 


Hs.520937 

Hs.439776 

Hs.512576 

Hs.415997 

Hs.458420 

Hs.575 

Hs.89649 


CSF2RA 

STOM 

KLRC1 

COL6A1 

ASMTL 

ALDH3A1 

EPHX1 


Interleukin  13  receptor,  alpha  2 _ 

Olfactomedin-like  2A _ 

Aryl-hydrocarbon  receptor  nuclear  translocator  2 _ 

Secreted  protein,  acidic,  cysteine-rich  (osteonectin) _ 

Caspase  1 ,  apoptosis-related  cysteine  protease  (interleukin 

1,  beta,  convertase) _ 

EGF-containing  fibulin-like  extracellular  matrix  protein  1 
Colony  stimulating  factor  2  receptor,  alpha,  low-affinity 

(granulocyte-macrophage)  _ 

Stomatin _ 

Killer  cell  lectin-like  receptor  subfamily  C,  member  1 _ 

Collagen,  type  VI,  alpha  1 _ 

Acetylserotonin  O-methyltransferase-like _ 

Aldehyde  dehydrogenase  3  family,  memberAI _ 

Epoxide  hydrolase  1,  microsomal  (xenobiotic) _ 

ATP-binding  cassette,  sub-family  C  (CFTR/MRP),  member 


0.899238 


Hs.90786 


3 


0.926966  Hs.914 


ABCC3 

HLA- 

DPA1 


Major  histocompatibility  complex,  class  II,  DP  alpha  1 


Supplementary  Figure  Legends 


Supplementary  Figure  SI.  Single  cell-derived  progenies  (SCPs)  of  MDA- 
MB-231  cells  have  a  uniform  Rosetta-type  poor  prognosis  gene  signature 
and  variation  in  gene  expression  correlating  with  metastatic  behavior. 

Fifty-four  of  the  70  Rosetta  poor  prognosis  genes  were  present  on  the  Affymetrix 
U133A  microarray  platform  and  performed  comparably  to  the  original  70  genes  in 
predicting  five  year  metastasis-free  survival  (Supplementary  Methods).  The  54 
genes  successfully  classified  patients  in  the  MSKCC  cohort  with  at  least  five 
years  of  clinical  follow-up  (77%  correct  classification  of  poor  prognosis  and  59% 
correct  classification  of  good  prognosis,  p=0.001)  and  was  used  to  assign  all  82 
patients  into  good  versus  poor  prognosis  groups,  a,  The  gene  expression 
centroid  for  the  MSKCC  good  prognosis  and  poor  prognosis  groups  are  shown  at 
the  top  of  the  heatmap.  Below  this  is  the  expression  of  each  of  the  54  shared 
Rosetta  poor  prognosis  genes  for  the  SCPs.  For  presentation  purposes,  the 
intensity  of  the  good  prognosis  and  poor  prognosis  centroids  was  increased  by  a 
factor  of  six  to  more  closely  match  the  overall  intensity  of  the  cell  line  data.  The 
gene  expression  data  is  median  centered  with  yellow  being  up-regulated  and 
blue  being  down-regulated.  Genes  overexpressed  (red  bar)  and  underexpressed 
(green  bar)  in  poor  prognosis  tumors  are  shown  on  the  bottom,  b,  Uniformity  in 
the  expression  of  a  Rosetta-type  poor  prognosis  signature  is  shown  using  a  pair¬ 
wise  Pearson  correlation  comparing  this  signature  among  the  SCPs  and 
indicated  MDA-MB-231  cell  lines.  ATCC  refers  to  parental  MDA-MB-231  cells,  c, 
Variation  in  gene  expression  among  SCPs  is  represented  in  three  dimensions 
using  multi-dimensional  scaling  and  reveals  three  distinct  groups  with  similarities 
in  gene  expression,  d,  Bioluminescence  imaging  (BLI)  of  representative  SCPs 
from  each  of  the  three  groups  taken  7  weeks  after  tailvein  or  intracardiac 
xenografting. 


Supplementary  Figure  S2.  Confirmation  of  protein  expression  for  lung 
metastasis  signature  genes  used  in  functional  validation.  The  indicated 
MDA-MB-231  in  vivo  selected  populations  were  analyzed  by  a,  Western  blotting 
for  SPARC  and  ID1,  b,  ELISA  for  MMP1  and  MMP2,  or  by  c,  flow  cytometry 
analysis  for  VCAM1  and  IL13Ra2  staining. 

Supplementary  Figure  S3.  Validation  of  combination  transgenic  parental 
MDA-MB-231  cell  lines  transduced  with  lung  metastasis  genes.  Parental 
MDA-MB-231  cells  were  retrovirally  transduced.  Northern  blot  analysis  identifies 
exogenous  transcripts  fora,  SPARC,  ID1,  and  MMP1,  b,  VCAM1,  IL13Ra2,  and 
MMP2,  or  c,  CXCL1,  EREG,  and  COX2.  These  genes  were  expressed  either 
individually  (which  is  shown  for  SPARC,  ID1,  and  MMP1),  or  in  combinations  of 
three  or  six.  Puro  represents  the  empty  vector  control. 

Supplementary  Figure  S4.  Parental  MDA-MB-231  cells  overexpressing  lung 
metastasis  genes  are  not  enhanced  in  bone  metastatic  activity.  Parental 
MDA-MB-231  cells  retrovirally  transduced  with  vector  controls  or  various 
combinations  of  lung  metastasis  genes,  and  highly  bone  metastatic  1833  cells 
were  injected  into  the  left  cardiac  ventricle  of  immunocompromised  mice. 
Bioluminescent  imaging  was  used  to  monitor  the  development  of  bone 
metastases.  Representative  mice  from  cohorts  of  5  animals  each  were  used  for 
presentation  purposes. 

Supplementary  Figure  S5.  Lung  metastasis  signature  genes  are  able  to 
distinguish  patients  at  high  risk  for  developing  lung  but  not  bone 
metastasis.  Patients  in  the  MSKCC  cohort  were  classified  using  a  linear 
combination  of  each  of  the  54  lung  metastasis  signature  genes,  a,  Each  gene 
was  weighted  by  its  estimated  Cox  model  regression  coefficient  for  either  lung  or 
bone  metastasis  to  classify  patients  into  a  clinical  low-risk  group  (blue)  or  a  high- 
risk  group  (red  and  brown),  b,  Each  of  the  54  genes  was  weighted  by  a  t-statistic 


derived  from  comparing  its  expression  between  LM2  cell  lines  with  the  parental 
MDA-MB-231  cell  lines  to  classify  patients  as  being  more  similar  to  either  the 
parental  cell  lines  (blue)  or  the  LM2  cell  lines  (red  and  brown).  Shown  are 
survival  curves  for  lung  metastasis-free  survival  (top)  and  bone  metastasis-free 
survival  (bottom)  with  p-values. 

Supplementary  Figure  S6.  Identification  of  a  subgroup  of  primary  breast 
cancers  that  express  the  lung  metastasis  signature  in  the  Rosetta  data  set. 

Hierarchical  clustering  of  primary  breast  carcinomas  from  a  cohort  of  77  breast 
cancer  patients9  was  performed  using  48  lung  metastasis  candidate  genes  that 
mapped  to  the  Rosetta  microarray9.  A  dendrogram  resulting  from  clustering  of 
the  tumors  is  shown  at  the  top,  with  tumors  from  patients  that  developed 
metastasis  denoted  by  black  circles.  The  rows  corresponding  to  the  nine  lung 
metastasis  genes  that  were  functionally  validated  in  mice  are  shown  in  greater 
detail  ( middle  panel)  with  the  names  of  each  gene  on  the  right.  The  Rosetta  poor- 
prognosis  signature  for  each  of  these  tumors  is  displayed  with  genes  that  are 
overexpressed  ( red  bar)  and  underexpressed  ( green  bar)  in  poor  prognosis 
tumors  indicated  on  the  left.  Expression  of  HER2,  estrogen 
receptor/progesterone  receptor  status,  and  basal  and  luminal  keratins  is  also 
shown27.  The  gene  expression  data  is  centered  with  red/gold  indicating  up- 
regulation  and  green/blue  indicating  down-regulation.  A  sub-cluster  with  a 
cluster  reproducibility  index  of  0.81  ( dashed  red  box)  groups  tumors  that  tended 
to  express  the  lung  metastasis  signature  in  a  manner  resembling  the  LM2  cell 
lines. 

Supplementary  Figure  S7.  Classification  of  primary  breast  cancers  that 
express  the  lung  metastasis  signature  used  in  class  prediction  training.  K- 

means  support  clustering  was  used  to  partition  the  breast  primaries  from  the 
Rosetta  data  set  into  four  clusters  (see  Supplementary  methods  section).  Shown 
are  the  lung  metastasis  gene-expression  signature  centroids  for  each  of  four 


consensus  clusters.  Cluster  0  refers  to  patients  that  were  unassigned  to  any  of 
the  four  clusters.  Also  shown  are  the  centroids  for  the  LM2  cell  lines  (4173, 

4175,  4180)  and  two  different  passages  of  the  parental  MDA-MB-231  cell  line 
(ATCC).  Similarity  of  each  consensus  cluster  to  the  LM2  cell  line  is  visualized  by 
hierarchical  clustering  and  the  Pearson  correlation  values  are  shown  in  the  table 
below  the  heatmap.  The  names  of  the  48  lung  metastasis  signature  genes  that 
mapped  to  the  Rosetta  microarray  platform  are  shown  on  the  right,  with  the 
genes  that  were  functionally  validated  shown  in  red.  Yellow  represents  up- 
regulated  genes,  and  blue  represents  down-regulated  genes.  Members  of 
cluster  3  were  defined  as  a  robust  subgroup  of  tumors  expressing  the  lung 
metastasis  signature  and  all  other  samples  were  labeled  as  not  expressing  this 
signature.  These  class  labels  were  used  to  train  a  classifier. 


Rosetta  Poor  Prognosis  Gene  Signature  (varTt  Veer  et  at) 


2  r-*  *»i  ifi  i<i  f-'i  -vf-  \jT i  -vt-  \i*  ‘/'i  xi-  r  ■*  so  o*:- 

S  »  91  9.  51  a  ft  ft.  ff:  51  Ui  ?:  a 

■w  o©o©©©©©o©©c©-l 


if)  hi  ^  <r  "*  mi  ri  i>  v  ^  it? 

t  O;  a  □>  Qi  ft;  Qi  ai  Q)  ai  ai  aj  ai  gj  ai 

w  QQDDaCGDQCQC  fm  Q 


£>  a  m  rn  rn  cm  Nt  kr  ki  mi  O  rn  fa;  p  p* 

t  ui’  a)  a*i  Q]  a-  a*  ai  □*  qi  or?  cr  a*  Qi 

*r  QC=iQQQCiOaCiaQi|4fQCI( 

j  ■  )  •  >M- 

CL  cp.  rn  UL1  mi  m  T*  mi  rn  mi  m»  mi'  o-  m  TT  i 

£  a<  a-i  a>  ai  ft;  ai  □)  ai  a>  ai  a  ai  at  ai 

tfl  *  oaaoQciQaoaWeiaci 


«  t-  h-  ^ 

3  r-ri  a  a  a 

jj  □  □  rj  □ 


'Q  h-  t 

^  fj)  CH 
o  a.  c*  a 


^  M  '•Hi  r*j  sO  ri*  -ttS'  h-  m3  ^  'CS  sO  rj- 

Q\  &  &i  &  Qr  &i  35  3 %  S  35 

o  C  O  O  O  Q  O  r-».  ©©©coo 


,&aS!Si$5!3aiaS!S}Sj3gS! 

W  OOOOOOrjQCSOOOO 

I'm  1 1 1 1 1  i  1 1 1 1 1 1  *  i 

'«  ©  ©  ©  O  ©  «— ;;  O  ©  ©  O  O  S  ©  © 


BL_i  ifio  ao  i'4  0  !.'■]  xi  i»*i  -•<  a:  •  ■•  f-i  -r*  co 

py  m  9i  9>  ^  ft  Q  at  9:  9:  O'.  a  a.  a  Os  a 

^  ©  ©  ©  ©  ifl  ©©Q0Q0OOO 


■vl-  I<-|  xl  xl  v£J  I/-1  l<1  Wi  r»'i 

a  a  a  a  a  a  a  a  a 

eoooooooo 


N  l/l  O  *1  ®  i^i  1*1  i^l  h-  ‘{  'O  r»i  rf  r*i 

OD-*.  acc’oaociocac'o 


ui  in  □  iti  "i  w  «  i»i  tti  in  mi  it 

□i  ai  ai  ai  a  dj  a  ai  a>  ai  a>  ai 

D  C  C1  C  □  □  g  o  □  q  □  □ 


Pn  tom  «m  tt  in  in  mi  in  mi  m  r*'  m*  t> 

H  U  o:  □■»  □;  ai  a*  a-v  a!  a-j  a?  □■*  ai  ai  □*  a-j 

^aoQQaciOiijCjCjCjujCj 


SmmmfSS! 


SPARC,  Idl.MMPI  + 
CXCL1,  Epireg,  PTGS2 

GXCI  I,  Fpireg,  PTGS? 

Pure 


O 


SPAliC,  Ml,  MMP1  + 
VGAM1,  IL13RA2,  MMP2 

VC  AM,  IL13KA2,  MMP2 
Puro 

SI 


Supplementary' Figure  S3 
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Supplementary  Figure  S4 
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Supplement  ary  Figure  S7 
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